An evaluation of methods for normalizing diffusion‐based covariance operators in variational data assimilation

Developing effective ways to model and cycle the background‐error covariance matrix is an active area of research in data assimilation. An important aspect of this problem when using a filter to model the background‐error correlations is the computation of normalization factors to ensure that the diagonal elements of the modelled correlation matrix are all equal to one. Updating the parameters of a flow‐dependent correlation model on each assimilation cycle requires updating the normalization factors, which is costly using traditional methods such as randomization. In this article, we discuss the normalization problem within the context of a diffusion filter‐based covariance model used for background‐error modelling in a variational data assimilation system for the global ocean. We evaluate various methods for estimating normalization factors when the diffusion tensor of the correlation model is derived from an ensemble of ocean states. Our results show that estimates produced using inexpensive methods derived from analytical considerations of the diffusion equation can have significant errors, especially near boundaries. Estimates obtained using randomization with a small sample size (∼100) are more accurate in a globally averaged sense but are noisy and can have unacceptably large errors locally. Next, we focus on the specific problem of accounting for flow‐dependent correlation parameters in the vertical component of the diffusion operator only, which is especially important near the surface for the assimilation of sea surface temperature observations. Remarkably accurate estimates are obtained by approximating the normalization matrix as a separable product of two normalization matrices: one computed using randomization with the horizontal diffusion operator only and the other computed using randomization with the vertical diffusion operator only. If the parameters of the horizontal component of the diffusion operator are static, then only the normalization factors of the flow‐dependent vertical component need to be recomputed on each cycle. This result is of significant practical interest since the vertical diffusion operator employs an inexpensive direct solver and thus can be applied on each cycle with a large random sample to obtain a good approximation of the normalization matrix.

since the vertical diffusion operator employs an inexpensive direct solver and thus can be applied on each cycle with a large random sample to obtain a good approximation of the normalization matrix.

K E Y W O R D S
background error, correlation operators, covariance operators, diffusion operator, ensemble estimation, ocean data assimilation, normalization factors, variational assimilation INTRODUCTION An ensemble of model forecasts provides flow-dependent information on forecast error that can be used to enrich background-error covariance models in variational data assimilation (VDA). The benefits of using ensembles to define flow-dependent background-error covariances in atmospheric VDA is well documented (Buehner, 2005;Belo Pereira and Berre, 2006;Bonavita et al., 2012;Clayton et al., 2013;Berre et al., 2015;Kleist and Ide, 2015;Bonavita et al., 2016). There is growing evidence of the benefits of ensembles in ocean VDA as well (Daget et al., 2009;Penny et al., 2015;Storto et al., 2018).
Ensembles can be used to define a sample estimate of the background-error covariance matrix or to calibrate parametric representations of the background-error covariances. A covariance matrix can be factored into a correlation matrix and a standard deviation matrix multiplying either side of the correlation matrix. In VDA, it is common to take advantage of this property in order to estimate the background-error correlations and standard deviations separately. Estimating correlations is inherently more challenging than estimating standard deviations because of the larger number of elements to estimate and because of the added constraints of ensuring symmetry and positive semi-definiteness of the resulting correlation matrix.
In VDA, correlation operators are commonly built from filters that possess positive-definite smoothing kernels. Filters can be used to localize sample covariances of background error or to model the background-error correlations themselves. By construction, a filter maps a space into itself and thus preserves the physical units of its input field. Viewed as an integral operator, such as a convolution for a homogeneous filter, it implies that the amplitude of the smoothing kernel acting on a three-dimensional (3D) input field, for example, must have physical dimensions of inverse volume in order to cancel out the volume elements associated with spatial integration over the 3D domain. In other words, the diagonal elements (variances) of the covariance matrix implied by the spatial filter have physical dimensions of inverse volume. The filter variances are clearly not the background-error variances that we wish to specify in data assimilation (that would be dimensionally inconsistent). In order to impose the appropriate background-error variances, a filter-based covariance matrix must first be normalized by its diagonal in order to transform it into a dimensionless correlation matrix, which has 1s along its diagonal.
The diagonal elements of a filter-based covariance matrix are not directly accessible, so techniques are needed to estimate them. When the filter parameters are flow-dependent, the estimation procedure must be repeated on each assimilation cycle, which is prohibitively expensive using a brute-force exact method and hardly affordable using a randomization method. More practical methods exploit the properties of the spatial filter and the structure of the associated filter-based covariance matrix in order to obtain an adequate estimate of the diagonal elements at a reasonable cost.
Correlation models can be defined using spectral filters where a scale-selective damping is performed in spectral space, or using spatial (grid-point) filters where a smoothing operation is performed directly in physical space. Spectral filters are common in meteorological applications of VDA for representing horizontal correlations. In its most basic form, a spectral filter can be interpreted as a spatial filter with an isotropic smoothing kernel. Normalization is comparatively straightforward for spectral filters. For the standard spectral filter, the normalization factors are constant and well approximated by the inverse of a truncated series of modulated spectral coefficients (Parrish and Derber, 1992;Courtier et al., 1998;Rabier et al., 1998). Correlation models based on wavelet filters generalize those based on spectral filters by allowing horizontal correlations to be both scale-and location-dependent (Fisher, 2003;Berre et al., 2015). Chabot et al. (2017) show how accurate normalization factors for a wavelet-based covariance matrix can be computed by applying a modified version of the inverse wavelet transform to the vector of specified wavelet-coefficient variances.
The recursive filter (Lorenc, 1992;Wu et al., 2002;Purser et al., 2003a;2003b) and the closely related class of diffusion filters (Derber and Rosati, 1989;Egbert et al., 1994;Weaver and Courtier, 2001;Yaremchuk and Smith, 2011;Weaver and Mirouze, 2013) are examples of grid-point (spatial) filters. Grid-point filters are especially popular in ocean VDA because they provide a convenient framework for handling land boundaries in ocean models. Techniques for approximately normalizing covariance models built from recursive or diffusion filters can be derived from analytical considerations of diffusion-like equations, but the techniques become increasingly complex when the smoothing kernels are anisotropic and inhomogeneous (Purser et al., 2003b;Purser, 2008a;2008b;Yaremchuk and Carrier, 2012). Extensions of the analytical normalization methods to accommodate boundary effects have also been proposed (Mirouze and Weaver, 2010;Mirouze and Storto, 2016). In this article, we evaluate different normalization techniques in a global version of the NEMOVAR ocean data assimilation system (Mogensen et al., 2012). Some of the techniques are already documented in the published literature, while others are relatively new. An objective of this article is to compare them in a consistent experimental framework.
The organization of the article is as follows. In Section 2, we outline the mathematical problem of estimating normalization factors within the context of the diffusion-based background-error covariance matrix used in NEMOVAR. We review some fundamental theoretical results from the diffusion equation, which provides the starting point from which we derive the approximate expressions of the normalization factors that are evaluated in this article. In Section 3, we describe how the correlation length-scales are estimated from a climatological ensemble of perturbations and how they are used to specify the diffusion tensor in the correlation model. In Section 4, we describe the different normalization methods that have been implemented in NEMOVAR. In particular, we focus on evaluating the accuracy of the normalization factors estimated using inexpensive approximate methods relative to the actual normalization factors determined from a costly randomization computation. In Section 5, we address the specific problem of updating normalization factors when only the correlation parameters of the vertical component of diffusion are evolved from one assimilation cycle to the next. A summary and discussion are given in Section 6.

Basic notation and statement of the problem
The model state variables in the Nucleus for European Modelling of the Ocean (NEMO; Madec and NEMO team 2008) are temperature, salinity, the two components of horizontal velocity, and sea surface height (SSH). The state vector x ∈ R N x contains the discrete values of the state variables on the model grid. In NEMOVAR, the control variables that are estimated through data assimilation are temperature, and the unbalanced parts of salinity, horizontal velocity and SSH. The control vector w ∈ R N w contains the discrete values of the control variables on the model grid where N w = N x if all the unbalanced fields are estimated.
We consider the following standard form of the background-error covariance matrix (Weaver et al., 2005;Mogensen et al., 2012;Balmaseda et al., 2013) where K b ∈ R N x ×N w is a lower triangular matrix that contains linearized balance constraints that relate the control variables to the state variables. Fundamental to the formulation of B (x) ∈ R N x ×N x is the assumption that the control variables are mutually uncorrelated, so that their associated background-error covariance matrix B (w) ∈ R N w ×N w can be modelled as a block-diagonal (univariate) matrix: where C (w) is a correlation matrix, block-diagonal with respect to the control variables, and (w) is a diagonal matrix containing estimates of the background-error standard deviations of the control variables. If the subscript is used to denote a particular control variable, then each block-diagonal element of B (w) is Here, we focus on the three-dimensional variational assimilation system used for the production of ECMWF's Ocean Reanalysis ORAS5 (Zuo et al., 2019). The observations assimilated in that system contain information about temperature, salinity and SSH, but not horizontal velocity. As a consequence, the unbalanced part of horizontal velocity is unaffected by data assimilation and can be ignored. For this particular case, horizontal velocity is determined entirely from the velocity balance relationship (geostrophy) in K b , which becomes a rectangular matrix since N w < N x . In order to provide flexibility in the shape of the correlation functions, C can be defined as a linear combination of univariate correlation matrices C l , l = 1, … , L : where l is a diagonal weighting matrix such that in order to preserve the property that a correlation matrix has 1s on its diagonal. Without loss of generality, we can focus on a particular component C l of C. For clarity of notation, we can then drop the subscript l in what follows. Each correlation matrix is modelled using a diffusion operator L (Weaver and Courtier, 2001). In its standard implicit form, the diffusion operator is where the elements of the matrix A are associated with the discrete representation of the elliptic operator A = I − ∇ ⋅ ∇ involving the identity operator I, gradient operator ∇, divergence operator ∇⋅, and diffusion tensor . The inverse matrix F = A −1 is a diffusion operator and the parameter M is an integer that defines the total number of diffusion steps, which we assume to be even for convenience. The diffusion operator acts as a smoothing operator where the parameters and M control the spatial length-scales and spectral properties of the underlying smoothing functions. The numerical algorithms used in NEMOVAR to apply Equation (2) are outlined in Section 2.2 and described in more detail in Weaver et al. (2016) and Weaver et al. (2018b). The elliptic operator A is self-adjoint with respect to the L 2 (Ω)-inner product in the spatial domain Ω under consideration. We assume that this property is preserved after discretization, which it is using the discretization techniques outlined in Section 2.2. The discrete form of the L 2 (Ω)-inner product, which we call the W-inner product, is defined by a diagonal matrix W of grid-and geometry-dependent weights. Since A is self-adjoint with respect to the W-inner product, the following factorization holds: The matrix LW −1 is a covariance matrix that is symmetric in the usual sense (i.e., with respect to the canonical inner product).
As pointed out by Ménétrier and Auligné (2015), the matrix square-root notation that is often used in data assimilation can be confusing. From Equation (2) the meaning is clear as L 1/2 can be interpreted as applying the diffusion operator over M/2 steps. However, with other diffusion formulations (e.g., those described in Section 2.2.2 and Section 4.5.2), this notation is ambiguous, so we prefer to avoid it and use the matrix V to denote the "square-root" factor of a diffusion matrix. For the standard case (Equation (3)), V = L 1/2 . The general expression of the factored correlation matrix (symmetric with 1s on its diagonal) is where 2 is a diagonal matrix of normalization factors 2 n : N being the total number of grid points for the control variable under consideration. To ensure that C has 1s on its diagonal, the nth element of the normalization matrix should be defined as the inverse of element nn of the symmetric diffusion matrix: When the correlation length-scales are large compared to the resolution of the grid, it can be computationally advantageous to apply the diffusion operator on a coarser grid. This situation arises when there are "large-scale" terms in a multiple-scale formulation of C (Equation (1)) or when the diffusion operator is used to perform covariance localization which tends to involve very broad scales. When the diffusion operator is applied on a grid that is coarser than the native grid, the correlation matrix becomes where the subscript "c" denotes the coarse-grid operators, and T and T T are transfer operators from coarse-to-fine grid and fine-to-coarse grid, respectively. For the coarse-grid formulation, Determining the diagonal elements of the diffusion matrix is not straightforward since, apart from W −1 and W −1 c which are diagonal matrices, the matrix components have complicated structure and are only available in operator form. If the normalization factors are not accurately specified, then the background-error variances will be effectively modified from their original values, which can lead to a degraded analysis.

Theoretical and numerical considerations
The spatial domain under consideration is the 3D global ocean as modelled by NEMO (Madec and NEMO team, 2008). The primitive equations in NEMO are represented in orthogonal curvilinear coordinates (i, j, k) on the sphere where longitude and latitude are continuous and differentiable functions of the horizontal coordinates (i, j) and depth z is a continuous and differentiable function of the vertical coordinate k. Local curvilinear distance in the (i, j, k) coordinate system is defined by the differential elements (ds 1 , ds 2 , ds 3 ) = (e 1 di, e 2 dj, e 3 dk) where (e 1 , e 2 , e 3 ) are scale factors that represent the local deformation of the curvilinear coordinates.
NEMOVAR inherits the numerical framework of NEMO. The discretization techniques are based on centred, second-order-accurate finite differences where the state variables are arranged on an Arakawa C-grid. The ocean mesh is defined by the transformation from (i, j, k) to geographical coordinates ( , , z). The grid points are located at integer or integer and a half values of (i, j, k), so that the mesh on which partial derivatives are evaluated is uniform with unit grid size (Δi = Δj = Δk = 1). Local curvilinear distance Δs m on the grid is then equal to the local scalar factor e m , m = 1, 2, 3. The scale factors define elements of distance, area or volume in the weighting matrix W in Equation (3) depending on whether the diffusion operator is one-dimensional (1D), two-dimensional (2D) or three-dimensional (3D). For the 1D vertical (z), 2D horizontal (h), and 3D horizontal-vertical (hz) diffusion operators, the weighting matrices are, respectively, where N h and N z denote the total number of horizontal grid-points and vertical levels, respectively, and N = N h N z .
In global applications of NEMOVAR, the analysis is performed directly on the tri-polar global grid (known as ORCA) used by NEMO, or on a subsampled version of that grid if the diffusion is performed on a coarse grid as in Equation (5).

2D implicit diffusion
The 2D implicit diffusion-based correlation matrix is formulated as (cf. Equation (4)) , M h is the total number of implicit diffusion steps, and h is the normalization matrix. The matrix A h is the discrete representation of the 2D elliptic operator A h = I − ∇ ⋅ h ∇ in the (i, j) curvilinear coordinate system on the sphere where h is the (horizontal) diffusion tensor. The ocean-land boundary condition is taken to be of Neumann type; that is, the field derivative in the direction normal to the land boundary is set to zero. At each grid-point, h = ( pq ) ∈ R 2×2 is a symmetric positive-definite (SPD) matrix. The off-diagonal elements ( 12 = 21 ) of the tensor allow for anisotropic stretching in directions that are not necessarily aligned with the curvilinear coordinates. In NEMOVAR, the off-diagonal elements are currently neglected in the diffusion operator, which has implications for the estimation of the diffusion tensor, as discussed in Section 3.
The standard procedure in NEMOVAR for applying V h and V T h is as follows. First, the self-adjoint matrices A h and A T h are transformed into a common, unit-preserving symmetric matrix: Then, V h and V T h are reformulated in terms ofÂ h so that it can be inverted using an efficient solver for sparse SPD matrices. In NEMOVAR, we use a linear solver based on the Chebyshev Iteration to build an approximation ofF h = A −1 h =F T h . Furthermore, we apply the same fixed number of iterations in V h and V T h in order to preserve symmetry of C h to machine precision (Weaver et al., 2016). The square-root operators in terms ofF h are

2D×1D implicit diffusion
In NEMOVAR, the conventional diffusion-based correlation matrix for 3D variables is built from a 2D horizontal diffusion operator that is applied separately in each model level k, and a 1D vertical diffusion operator that is applied separately at each horizontal grid-point (i, j). This separation is justified by the large difference in spatial scales between the horizontal and vertical directions. We denote the horizontal and vertical diffusion operators is the inverse of the matrix representing the discretized 2D elliptic operator and M h is the number of 2D implicit diffusion steps.
z ij is the inverse of the matrix representing the discretized 1D elliptic operator and M z is the corresponding number of 1D implicit diffusion steps (which can be different from M h ).
Constructing a self-adjoint 3D diffusion operator L h×z ∈ R N×N from the self-adjoint components L h k and L z ij requires some care. The formulation used in NEMOVAR is outlined below. Let = ( T 1 , … , T N z ) T ∈ R N be a vector associated with a 3D variable where k ∈ R N h contains the values of the variable in model level k. In order to reconcile the different dimensions of the diffusion matrices, we introduce the following extension matrices to map from the reduced spaces R N h and R N z to the full space R N : where the non-zero matrix I N z acts on variables in level k, and e ij = (0, … , 0, 1, 0, … , 0) T ∈ R N h , the non-zero element being located at the ijth entry. The transpose of the extension matrices, E T where W h and W z are defined in Equation (6). Using the property that F h is self-adjoint with respect to theŴ h -inner product and that F z is self-adjoint with respect to theŴ z -inner product, we can express Equations (7) and (8) in the following square-root forms: whereŴ hŴz = W hz (the last relation in Equation (6)). The product L h L z is not self-adjoint with respect to the W hz -inner product, so it cannot be used directly to define a symmetric correlation matrix for 3D variables. Self-adjointness can be imposed by using the square-root forms of L h and L z , forming their product, and interchanging the terms assuming that they commute (which they do not). This leads to several possible formulations involving different expressions for the square-root matrix, which we denote by V (m) h×z : The corresponding correlation matrix is where 2 h×z is the normalization matrix. We refer to these formulations as 2D×1D (m) where the value of m will be added in parentheses when we need to distinguish them.
Two obvious formulations for the square-root matrix are where which involve applying the horizontal and vertical diffusion operators separately in the square-root matrix, but in a different order. Two others formulations are which can be derived from V (1) h×z and V (2) h×z by taking M h = M z = M hz and moving the step counter outside of the product of F h×z and V (4) h×z are preferable to V (1) h×z and V (2) h×z since they allow more coupling between the horizontal and vertical diffusion operators, which results in better smoothness properties, especially near complex bathymetry. This feature is more important than maintaining flexibility in the choice of M h and M z , which is allowed with V (1) h×z and V (2) h×z . We say that V (3) h×z and V (4) h×z are 2D×1D formulations with interleaving. Weaver et al. (2016) illustrate the importance of interleaving to avoid numerical artifacts near irregular coastlines and islands when using a 2D horizontal diffusion model formulated as a product of 1D diffusion operators (their figure 1). Interleaving is important in the 3D formulation for similar reasons, as will be illustrated in Section 5.4. The computational costs of the four formulations above are similar when using the same values of M h and M z . We will come back to the formulations in Equation (11) in Section 5 when discussing methods for estimating normalization factors.
The horizontal diffusion operators in V (m) h×z are applied using the Chebyshev Iteration as outlined in Section 2.2.1. The vertical diffusion operators in V (m) h×z involve inverting small self-adjoint positive-definite matrices A z ij of size N K × N K , which can be done efficiently using a direct solver. Neumann boundary conditions are imposed at the top and bottom boundaries. We can transform A z ij into symmetric formÂ z ij = W z A z ij =Â T z ij , precompute the Cholesky decomposition ofÂ z ij , and then use standard forward-elimination and backward-substitution procedures to applyÂ −1 z ij (Golub and Van Loan, 1996). In terms of the augmented symmetric matrices, the square-root factors are h×z and V (4) h×z .

3D implicit diffusion
Instead of separating the 3D correlation matrix into horizontal and vertical diffusion operator components, we can model it directly, as in Weaver et al. (2018b), as a 3D implicit diffusion operator: , M hz is the total number of 3D implicit diffusion steps, and hz is the normalization matrix. The matrix A hz is the discrete representation of the 3D elliptic operator A hz = I − ∇ ⋅ hz ∇ in the (i, j, k) coordinate system, and hz = ( pq ) ∈ R 3×3 is the horizontal-vertical diffusion tensor where pq = qp . The horizontal component of the tensor, h , contains the elements ( pq ), p, q = 1, 2. In the following, we neglect the anisotropic coupling between the horizontal and vertical directions ( 13 = 23 = 0). The element 33 then corresponds to the vertical diffusion coefficient z introduced earlier with the 2D×1D formulation.
The procedure for applying V hz and V T hz mirrors that used for the 2D implicit diffusion operator, which involves first transforming A hz into a symmetric matrixÂ hz and then approximatingF hz =Â −1 hz using the Chebyshev Iteration. In terms ofF hz , the square-root operators are The 3D formulation is simpler but more costly than the 2D×1D (m) formulations. A major factor influencing the computational cost of the 3D formulation when using ensemble-estimated correlation parameters is the large condition number of A hz caused by large vertical length-scales in the mixed-layer region where the vertical resolution is fine. The large condition number affects the convergence rate of the Chebyshev Iteration. In the 2D×1D formulations, the Chebyshev iteration is only used for the horizontal component and the efficiency of the direct solver used for the vertical component is not affected by the condition number ofÂ z . Weaver and Courtier (2001) and Weaver and Mirouze (2013) discuss analytical solutions of the diffusion equation on the sphere S 2 and derive expressions for the associated positive-definite smoothing kernels that can be interpreted as isotropic covariance functions. Like all isotropic covariance functions on S 2 , those associated with implicit diffusion are defined as a series expansion of Legendre polynomials (Gaspari and Cohn, 1999). Of particular interest here is the expression for the intrinsic variance of the covariance function as the inverse of this variance is the normalization factor needed to transform the covariance function into a correlation function. For the covariance functions on S 2 , the normalization factor is the inverse of an infinite series involving the total wave number of the spherical harmonics. A more convenient, closed-form expression of the normalization factor can be derived by considering the analytical solution of the diffusion equation in the Euclidean space R 2 , which approximates well the solution on S 2 on spatial scales that are small compared to the radius of the Earth and thus relevant in ocean data assimilation. Expressions for the normalization factor associated with diffusion in R 1 and R 3 can also be derived. They are given below since they provide the basis for the approximate expressions of the normalization factors described in Section 4.5.

Analytical expressions for the normalization factors
The analytical solution of the elliptic equation of implicit diffusion in R 2 is well known (Whittle, 1963;Guttorp and Gneiting, 2006). If h is constant, then the solution of the elliptic equation can be expressed as an integral equation over R 2 where its kernel is an isotropic and homogeneous covariance function from the Matérn family. The total number of implicit diffusion steps M h is related to the standard smoothness parameter of the Matérn covariance function. The inverse of the variance of the Matérn function is the normalization factor and is given by is the square of the characteristic length-scale of the Matérn function. For the 1D implicit diffusion operator used in the 2D×1D formulations, we can estimate the normalization factor by examining the solution of the implicit diffusion equation on R 1 with constant diffusion coefficient (Mirouze and Weaver, 2010). The Matérn kernel of the integral solution is an Auto-Regressive (AR) function of order M z , with characteristic length-scale The normalization factor for the vertical component is the inverse of the intrinsic variance of the AR covariance function: As in R 1 , the covariance functions represented by the implicit diffusion operator in R 3 are AR functions whose order is equal to the number of 3D implicit diffusion steps M hz and whose characteristic length-scale hz is The inverse of the intrinsic variance of the M hz th-order AR covariance function provides an estimate of the normalization factor for the 3D implicit diffusion operator: which can be evaluated numerically using the recursive formula Results can be generalized to R d where the expression for the normalization factor after M steps of implicit diffusion with a constant diffusion tensor is where Γ( ) is the Gamma function and d = √ det( ).

Daley length-scale and tensor
The Daley length-scale D is a standard scale parameter used in data assimilation for characterizing isotropic correlation functions c(r) that are at least twice differentiable (Daley, 1991). It is defined in terms of the local curvature of the correlation function at its peak: where ∇ 2 = tr(∇∇ T ) is the d-dimensional Laplacian operator. For Matérn functions in R d represented by M steps of an implicit diffusion operator with constant diffusion tensor, it can be shown that which is valid for M > (d + 2)/2. In the limit as M → ∞ with D held fixed, the Matérn function converges to the Gaussian function and the normalization factor Equation (14) converges to the more familiar expression For anisotropic correlation functions, the component Daley length-scales are defined through a tensor (a d×d matrix), D = H −1 where (Weaver and Mirouze, 2013) is the Local Correlation (Hessian) Tensor (LCT). In the isotropic case, D = D 2 I d where D is a scalar given by Equation (13). For the Matérn functions represented by the d-dimensional implicit diffusion operator with anisotropic diffusion tensor, it is straightforward to show that The Daley tensor is of interest since it can be estimated directly from derivatives of ensemble perturbations. Equation (15) then allows us to relate it to the tensor that needs to be specified in the diffusion operator. This technique has been used for defining the diffusion tensor in the experiments reported here, as described in the next section.

SPECIFICATION OF THE DIFFUSION-MODEL PARAMETERS
In order to evaluate the different methods for estimating normalization factors, we consider a diffusion tensor that is estimated from the five-member (four perturbed + one unperturbed) ORAS5 (Ocean ReAnalysiS 5) ensemble of ocean reanalyses for the low-resolution (LR) global configuration ORCA1-Z42 (approximately 1 degree horizontal resolution and 42 vertical levels). The LR version of ORAS5 is denoted ORAS5-LR. A four perturbed-member ensemble is far too small to determine a statistically robust estimate of the diffusion tensor. In order to reduce sampling error, we estimate the tensor from a climatological ensemble for the 5-year period 2010-2014, focusing on seasonal statistics from December-January-February (DJF). Since the assimilation cycle width in ORAS5-LR was 10 days, the total effective ensemble size used to compute the seasonal climatology is 180 (4 perturbed members × 9 cycles per season × 5 years).
On each assimilation cycle, the LCT (Equation (14)) is first estimated at each grid point from the tensor product of the local gradient of the ensemble perturbations: (16) where ′ denotes an ensemble perturbation for a control variable, centred about the ensemble mean, and is the ensemble standard deviation (Michel et al., 2016). The overbar denotes the ensemble average. The ensemble perturbation for a control variable is computed by removing the balanced component from the ensemble perturbation of the associated state variable. This is done by applying the inverse of the balance operator (K −1 b ) to the state perturbations where the linearization state for K b is provided by the unperturbed member. The unperturbed member is not used to compute the ensemble mean and standard deviation.
The derivatives in Equation (16) are estimated using centred finite differences. The procedure is outlined below. The reader is referred to appendix A of Weaver et al. (2018a) for details on the numerics. For the control variables under consideration, we choose to estimate H at T-points on the Arakawa C-grid. These points correspond to the centre of the grid cells. First, we use a centred difference of the normalized perturbations at adjacent T-points to estimate the first derivatives of the perturbations with respect to i and j. This produces derivative estimates at the ensemble of U-and V-points on the C-grid, which are points located in the middle of the edges of the grid cells. At coastline points, the derivative in the direction normal to the coastline is set to zero. The sample variance of the product of the i-component derivative with itself is then computed directly at the U-point to estimate the diagonal element H 11 . Likewise, the sample variance of the product of the j-component derivative with itself is computed directly at the V-point to estimate H 22 .
The off-diagonal elements H 12 and H 21 are estimated from the sample variance of the cross-product of the perturbation derivatives with respect to i and j. This requires interpolating one of the component derivatives to the point where the other component derivative is defined. Here, we do this by simply averaging the derivatives at the four nearest surrounding points. We estimate H 12 (defined at V-points) and H 21 (defined at U-points) separately.
For 3D variables, the sample variance of the product of the k-component derivative of the normalized perturbations with itself is used to provide an estimate of H 33 at W-points. These points are located at the interface between model layers. Currently, we do not estimate the off-diagonal elements H 13 , H 31 , H 23 and H 32 . The tensor elements at U-and V-points of a given cell are then averaged to the T-point of that cell. Finally, in order to ensure that the resulting tensors are SPD, and hence invertible, the off-diagonal elements H 12 and H 21 are averaged, and are set to zero at grid-points where the determinant is not positive.
The estimated LCT elements are filtered in each model level using a 2D diffusion operator with a constant horizontal filtering scale. The filtering procedure proposed by Michel et al. (2016) is used to preserve positive definiteness of the LCT. The filtering scale is determined using the optimally based method of , where its implementation in NEMOVAR is described in Weaver et al. (2018a). The filtered LCT is then averaged over the months DJF from 2010-2014 to produce a climatological LCT and inverted to produce a climatological Daley tensor D. The climatological estimate of D is converted into an effective diffusion tensor by dividing by the factor (2M − d − 2) in Equation (15). Ten implicit diffusion steps (M = 10) are used with all diffusion models (1D, 2D and 3D). The covariance function associated with this value of M is approximately Gaussian. Finally, the elements of the estimated diffusion tensor are averaged to the appropriate cell interface points where their values are required for the finite difference formulation of the diffusion operator.
As stated earlier, the current diffusion model in NEMOVAR does not account for the off-diagonal elements of the diffusion tensor. This raises the question of how best to define the diagonal elements, given estimates of both the diagonal and off-diagonal elements. A sensible approach is to rescale the diagonal elements such that the determinant of the adjusted diagonal tensor equals the determinant of the estimated non-diagonal tensor: where the elementŝp q on the right-hand side are those estimated from the ensemble. This method has been applied here. Various checks are then applied on the estimated diffusion tensor. In the upper 600 m, which roughly corresponds to the maximum mixed-layer depth, the maximum-allowed value of the vertical length-scale z = √ z is bounded by the maximum mixed-layer depth itself (600 m). This check avoids excessively large vertical length-scales in the mixed-layer region caused by small vertical derivatives in the tensor estimation method. Another check is used to prevent the vertical length-scales from exceeding the local ocean depth, which can occur in shallow-sea regions. The estimated horizontal tensor elements can be artificially large in semi-closed seas and close to boundaries. To prevent this, those elements are scaled such that, at each grid point, the horizontal length-scale h = {det( h )} 1∕4 does not exceed the Euclidean distance to the nearest coastline point. A final check is made in order to eliminate length-scales that are smaller than the local grid resolution. In particular, if the estimated effective local length-scale is smaller than the geometric mean of the local scale factors, then the tensor elements are set to the scale factors themselves. A similar check is done on the vertical diffusion coefficient.
The estimated horizontal tensor elements for temperature in the surface level are shown in Figure 1. The tensor elements are proportional to the square of the correlation length-scales. While it is not the objective of this article to provide a detailed analysis of the estimates, various basic features can be identified. In the Tropics, 22 is smaller than 11 , reflecting well-known anisotropy related to equatorial dynamics. Smaller scales are also evident in western boundary current regions (Gulf Stream and Kuroshio Current, especially for temperature) where eddies and ocean fronts are prominent.
It is interesting to compare these estimates qualitatively with the parametrized horizontal tensor elements used for the actual production of ORAS5-LR. For ORAS5-LR, the same values were used for all variables and at all levels. The parametrized horizontal tensor elements correspond to values of the Daley length-scale equal to D ≈ 222km everywhere except within 15 • of the Equator where the zonal (meridional) components were stretched (reduced) following a cosine dependence (Mogensen et al., 2012). Directly at the Equator, D zonal ≈ 444 km and D merid ≈ 111 km. Compared with the ensemble estimates in Figure 1, the parametrized tensor elements (not shown) are generally larger everywhere, with the exception of the equatorial region where the parametrized and estimated 22 have some similarities. The horizontal Daley length-scales used in ECMWF's high-resolution ocean reanalysis ORAS5, produced with the global configuration ORCA025-Z75 (1/4 degree horizontal resolution and 75 vertical levels), was based on a different parametrization that depends on the Rossby radius. That parametrization produces horizontal tensor elements that decrease with increasing latitude, similar to what is observed in the ensemble-estimated values with ORAS5-LR.
The estimated vertical diffusion coefficient is shown in Figures 2 and 3. In the top level (Figure 2), the vertical length-scales are much larger in the Northern Hemisphere than in the Southern Hemisphere. The large scales are associated with strongly correlated errors in the mixed layer, which is deep in the Northern Hemisphere winter. The vertical length-scales are also large in the lowest levels of the model where the resolution is coarse, but much smaller in the highly stratified thermocline region (Figure 3). For comparison, the parametrized vertical diffusion coefficients used in ORAS5-LR were specified by setting the vertical Daley length-scales to be proportional to the vertical scale factors (e 3 ) where the proportionality constant was taken to be equal to one. This simple resolution-dependent parameterization broadly captures some features observed in the ensemble estimates, except for the mixed-layer region where the parametrized vertical scales are too small. The parametrized vertical scales in the deep ocean are also somewhat underestimated compared to those from the ensemble.

NORMALIZATION FACTOR ESTIMATION METHODS
The constant normalization factors derived from the analytical solutions of the diffusion equation are only crude approximations of the actual normalization factors for diffusion-based correlation models used in practice. There are three basic reasons for this. First, the actual normalization factors will depend on the numerical approximations used to discretize and solve the diffusion equation. Second, realistic diffusion tensors, such as those estimated from an ensemble, are not constant, but rather have considerable 3D spatial structure. Third, normalization factors are strongly affected by solid-wall boundary conditions, especially in a global ocean model where the boundary geometry is complex.
In the following subsections, we describe the different methods that have been implemented in NEMOVAR for estimating normalization factors, and we compare them in terms of their accuracy and computational cost.

Brute-force (exact) method
Let e n = ( 0, … , 1, … , 0 ) T where the value of 1 is located at the nth grid-point. The nth diagonal element of any square matrix can be computed exactly by applying that F I G U R E 2 The vertical diffusion coefficient z (m 2 ) for temperature at 5 m depth (the surface level) for months DJF. The colour palette uses a log 10 scale matrix to e n and extracting the nth element from the resulting vector. Specifically, the inverse of the normalization factor at the nth grid-point of the symmetric diffusion matrix is given by (cf. Equation (4)) where {⋅} n means the nth element of the vector in curly brackets. For the coarse-grid formulation (Equation (5)), the expression would also involve the transfer operator and its transpose. To compute all normalization factors using Equation (18) requires as many applications of the diffusion operator as there are grid points, which is clearly prohibitive for a high-resolution global configuration where the number of ocean points for each 3D variable is on the order of 10 7 . The cost of applying the diffusion operator can be halved by noting that −2 n is equal to the scalar product Equation (19) can be evaluated by applying the adjoint of the diffusion operator over half the number of diffusion steps and then computing the scalar product of the result with itself. In this case, half of the diffusion operations per grid point are replaced by a scalar product (global sum) per grid point. This is generally cheaper than using Equation (18) directly, but still prohibitively expensive to be applied at all grid points. In practice, this method can only be used to compute exact normalization factors at a selected number of points, which can be useful for diagnostic purposes.

Randomization method
The randomization method was proposed by Andersson et al. (2000) for diagnosing the variances of any covariance matrix that can be represented in square-root operator form. Weaver and Courtier (2001) discussed how the method could be used specifically for estimating normalization factors, which we summarize below. Consider a random vector of normally-distributed noise with mean equal to zero and standard deviation equal to one; that is, ∼ N[0, I] where E[ ] = 0 and E[ T ] = I, E[⋅] being the mathematical expectation operator. We can relate to a symmetric positive semi-definite (covariance) matrix AA T using the identity We can use the above identity to compute a sample estimate of the covariance matrix AA T from a set of independently drawn random vectors q ∼ N[0, I], q = 1, … , Q.
Here, we are interested in estimating the variances (diagonal elements) of AA T for the specific case where where • denotes the element-by-element (Schur) product of two vectors, and F I G U R E 3 A vertical section of z (m 2 ) at approximately 35 • N. (b) is an expansion of the top 1,000 m of (a). The colour palette uses a log 10 scale The randomized estimate of the matrix of normalization factors is then The sampling errorṽ e =ṽ − v is unbiased (E[ṽ e ] = 0) and the statistical moments can be expressed in terms of v . In particular, for each component v i , the mean squared error of the estimate is which shows that the relative error of the variance depends on the sample size but not on the true variance, and hence is independent of the resolution of the model. For example, the relative error is 0.02% with a sample size of Q = 10 4 . Note that Equation (21) describes the expected mean squared error of the variance, not the normalization factor (inverse of the variance) for which there is no simple relationship. Randomization is also expensive, but not as prohibitively expensive as the exact method. Randomization is practical for computing accurate estimates of the normalization factors when using correlation model parameters that are static or updated infrequently. Furthermore, there is scope for accelerating the convergence of randomization using carefully constructed input vectors obtained using Hadamard matrices or probing techniques (Bekas et al., 2007;Laeuchli, 2016). While F I G U R E 4 The reference normalization factors (m 2 ) for temperature at 5 m depth (the surface level) with the 2D×1D (3) implicit diffusion operator where the diffusion tensor has been estimated from a climatological ensemble as described in Section 3 and illustrated in Figure 1. The normalization factors have been computed using the randomization method with a sample size of 10 4 . The colour palette uses a log 10 scale this is an interesting possibility, we have not explored it here.

Reference field and error diagnostics
In what follows, we will use the normalization factors computed using randomization with a sample size of 10 4 as the reference field to which estimates generated using approximate methods will be compared. The reference normalization factors for temperature with the standard 2D×1D (3) formulation (Equation (10) with m = 3) are shown in Figures 4 and 5. The spatial structure is similar to that observed in the diffusion tensor elements (Figures 1-3), which illustrates the close connection between the normalization factors and correlation length-scales. Indeed, this is expected from the analytical expression for the normalization factor. Chabot et al. (2017) make a similar observation with a wavelet-based covariance model. Also noticeable are the reduced values of the normalization factors close to land boundaries, which counteract an overestimated amplitude caused by the Neumann boundary condition (Mirouze and Weaver, 2010).
The experiments described in the following sections are summarized in Table 1. To assess the quality of the approximate normalization factor̃2 n , we will compute the relative error where 2 n is the reference. Spatial plots of  n will be displayed only for the temperature normalization factors with the 2D×1D (3) formulation. To provide a global measure of the error, we will also calculate the mean of the absolute value of the relative error (hereafter referred to simply as the mean error), for all the control variables (unbalanced SSH, temperature, unbalanced salinity). The values of  for the experiments in Table 1 are given in Table 2 where they are expressed as a percentage error.

Randomization with a small sample size
A sample size of 100 could be affordable on each assimilation cycle to estimate normalization factors for a fully flow-dependent background-error correlation matrix. The mean relative error with 100 samples is 12% (Experiment 1 in Table 2). However, the randomized estimates are noisy as evident in Figure 6 which displays  n for the temperature normalization factors in the surface level. In places, the error exceeds 100%. A spatial filter can be applied to the randomized estimates to suppress the noise to some extent. We can use the diffusion operator as a filter and adjust the diffusion tensor to select a desired smoothing scale. Here, we have applied a horizontal diffusion filter to the randomized estimates of the variances (inverse normalization factors). At each grid point n, the diffusion tensor is taken to be proportional to the diagonal tensor of squared horizontal scale factors:

F I G U R E 5 A vertical
section at approximately 35 • N of the reference normalization factors (m 2 ) for temperature. (b) is an expansion of (a) in the top 1,000 m. The colour palette uses a log 10 scale As in the correlation operator, we set M = 10, which makes the filter approximately Gaussian. While the filter reduces the noise, it has a detrimental effect near land boundaries where the errors are noticeably increased (not shown). The normalization factors are overestimated at grid points closest to the boundary, and underestimated at grid points further from the boundary. The larger errors near the boundaries are responsible for an increase in the average error (Experiments 2 and 3 in Table 2). In Experiment 2, the filter scale is taken to be equal to the local scale factor ( = 1). Doubling the filter scale ( = 2) suppresses the noise further but exacerbates the problem near land boundaries and leads to a further increase in the error (Experiment 3).

4.5
Analytical and empirical methods

Zeroth-order analytical approximation
An approximate expression for the normalization factors can be derived from the analytical expression for the normalization factors by replacing the constant diffusion tensor with the spatially dependent diffusion tensor. We refer to this as the zeroth-order approximation. For 2D diffusion and the two basic formulations of 3D diffusion, the zeroth-order expressions are We can expect these expressions to provide a good approximation of the normalization factors in regions far from land boundaries and where the spatial variations in the diffusion tensor are weak. However, these expressions are unlikely to be adequate in regions close to boundaries and where the diffusion tensor varies substantially between grid points, as occurs with a tensor estimated from ensembles. This is confirmed by Figures 7 and 8, which show the relative error in the normalization factors for temperature when using Equation (26). The spatial structure of the error roughly matches that of the vertical diffusion coefficient in Figures 2 and 3. The normalization factors are largely overestimated in the upper ocean levels and near bottom bathymetry (the errors are positive) where they are strongly influenced by the Neumann boundary condition, the effects of which are not accounted for in the analytical approximation. The errors are especially large (over 1,000%) in the upper levels in the Northern Hemisphere where the vertical diffusion coefficients are large. The mean errors in Table 2 (Experiments 4 and 5) are significantly larger for temperature and unbalanced salinity (3D fields) than for unbalanced SSH (a 2D field).

TA B L E 2
The spatial average of the absolute value of the errors in the approximate normalization factor normalized by the reference normalization factor (Equation (22)) for the experiments listed in Table 1 Expt

F I G U R E 6 Experiment 1.
The relative error between the reference normalization factors for temperature in the surface level ( Figure 4) with the 2D×1D (3) implicit diffusion operator and those produced using the randomization method with a sample size of 100

Diffusion formulation with Neumann and Dirichlet boundary conditions
Near land boundaries, the Neumann boundary condition prevents flux exchanges across the boundary, causing the amplitude of the covariance function to increase there.
The opposite occurs with a Dirichlet boundary condition where the amplitude is diminished near land boundaries in order to satisfy the condition that the field is zero at the boundary. By examining the solution of the continuous, 1D diffusion equation with constant diffusion coefficient, in the presence of an isolated, straight boundary, Mirouze and Weaver (2010) show analytically that the F I G U R E 7 Experiment 4. The relative error in the normalization factors for the 2D×1D (3) implicit diffusion operator for temperature in the surface level when using the analytical approximation given by Equation (24) F I G U R E 8 Experiment 4. The relative error in the normalization factors for the 2D×1D (3) implicit diffusion operator for temperature at approximately 35 • N when using the analytical approximation given by Equation (24). Note that the scale in the colour palette is one order of magnitude smaller than that used in Figure 7 correct amplitude can be obtained by redefining the correlation operator as an average of two diffusion operators, one that employs the Neumann (N) boundary condition and the other that employs the Dirichlet (D) boundary condition. We call it the ND formulation.
The horizontal implicit diffusion operator with the ND formulation can be written as where the different boundary conditions are denoted by subscripts N and D. The corresponding ND formulation of the correlation matrix involves a rectangular "square-root" factor: A similar expression follows for the ND formulation of the correlation matrix with the 3D implicit diffusion operator (Section 2.2.3).
For the 2D×1D formulations, we use the approach outlined in Section 2.2.2 to construct a self-adjoint 3D diffusion operator L ND h×z (cf. Equation (9)) from the product of L ND h and the ND formulation of the vertical diffusion operator: This yields a family of 3D correlation matrices (m = 1, … , 4): In particular, for the 2D×1D formulations without interleaving (m = 1) and with interleaving (m = 3), the square-root factors are which involves four 2D×1D diffusion operations compared to two 3D diffusion operators in the corresponding (non-separable) 3D ND formulation. Similar expressions hold for m = 2 and m = 4, by switching the order of the horizontal and vertical diffusion operators. The ND formulation can be restricted to the vertical diffusion operator only, which we call the NDz formulation. Assuming Neumann boundary conditions for the horizontal diffusion operator, this yields for m = 1 and m = 3 which involves two diffusion operations as in the 3D ND formulation. The mean errors with and without the ND formulation are reduced by 22% for temperature and 32% for unbalanced salinity, but are similar for unbalanced SSH (Experiments 6, 7 and 8 in Table 2). For unbalanced SSH, the errors remain large near land boundaries, probably due to a combination of the effects of irregular boundary geometry and large spatial variations in the tensor. For temperature and unbalanced s, the errors are predominantly reduced in the upper ocean and near bathymetry (cf. Figures 7-10). The mean errors for the experiments with the full ND and NDz formulations (cf. Experiments 6 and 7) are very similar, indicating that the ND formulation works best with the vertical diffusion operator. This is likely because the assumptions used to derive the ND formulation are better satisfied by the vertical diffusion operator, especially in open ocean areas near the surface where boundary geometry is simple and where the vertical diffusion coefficient is approximately constant. Overall, however, the errors remain unacceptably large.

Analytical boundary correction
An obvious drawback with the ND formulation is the additional cost that it entails. Building on the theoretical analysis of Mirouze and Weaver (2010), Mirouze and Storto (2016) proposed a simple analytical correction to the normalization factor near the boundary as an alternative to the more costly ND approach. With the Neumann boundary condition, their analysis suggests that the normalization coefficient should be corrected by a multiplicative factor = 1∕{1 + c(2 r bndy )} where c(⋅) is the Matérn correlation function corresponding to the approximate kernel of the normalized diffusion operator and r bndy is the Euclidean distance to the boundary. For example, directly at the boundary (r bndy = 0), equals 1/2 to compensate for the doubling of the amplitude there with the Neumann boundary condition for the 1D case when the diffusion coefficient is constant (appendix B in Mirouze and Weaver (2010)). Far from the boundary (r bndy ≫ ), the correction factor is approximately equal to one. With the Dirichlet boundary condition, the correction factor is = 1∕{1 − c(2 r bndy )}, which is singular directly at the boundary point and therefore must be regularized in numerical applications by adding a small number to the denominator. The correction for the Neumann boundary condition in the 2D implicit diffusion operator yields approximate normalization factors where c h is the correlation function implied by the diffusion operator, and r bndy n is the Euclidean distance to the nearest coastline point. For 2D implicit diffusion, the correlation function used to compute the correction factor F I G U R E 9 Experiment 6. The relative error in the normalization factors for temperature in the surface level when using the analytical approximation given by Equation (24) with a ND formulation of the 2D×1D (3) implicit diffusion operator F I G U R E 10 Experiment 6. The relative error in the normalization factors for temperature at approximately 35 • N when using the analytical approximation given by Equation (24) with a ND formulation of the 2D×1D (3) implicit diffusion operator. Note that the scale in the colour palette is one order of magnitude smaller than that used in Figure 9 should be a Matérn function involving the modified Bessel function of the second kind (Weaver and Mirouze, 2013). For simplicity, we approximate this function by a Gaussian function, which is adequate for the value of M h = 10 considered.
For the 2D×1D diffusion formulations, the correction gives approximate normalization factors where c z is an M z th-order AR function and z bndy n is the Euclidean distance to the nearest upper or bottom boundary point. The correlation function c h is the same as the one used in the correction factor for the 2D diffusion operator. For the 3D diffusion operator, the corrected normalization factors are given by Equation For unbalanced SSH, the boundary correction term results in a significant reduction in the error near land boundaries compared to both the zeroth-order and ND formulations (not shown). This improved error reduction is also visible in the mean error (Experiments 9 and 10 in Table 2). The reduction of the mean error is more impressive for temperature and unbalanced salinity where it is up to 8% better than with the ND and NDz formulations (compared to 3% better for unbalanced SSH). Figures 11 and 12 illustrate that the boundary correction term has a similar impact to the ND formulation near the surface, but produces noticeably smaller errors near F I G U R E 11 Experiment 9. The relative error in the normalization factors for the 2D×1D (3) implicit diffusion operator for temperature in the surface level when using the analytical approximation (Equation (26)), involving the boundary correction term to account for the effect of the Neumann boundary condition F I G U R E 12 Experiment 9.
The relative error in the normalization factors for the 2D×1D (3) implicit diffusion operator for temperature at approximately 35 • N when using the analytical approximation (Equation (26)), involving the boundary correction term to account for the effect of the Neumann boundary condition. Note that the scale in the colour palette is one order of magnitude smaller than that used in Figure 11 bottom bathymetry. The errors also change sign over large areas in the ocean interior, probably due to the large vertical length scales in the mixed layer and bottom boundary regions that are used in the correction term.

Smoothing the diffusion tensor in the zeroth-order expression
By considering an asymptotic expansion of the solution of the diffusion equation for the case of a slowly and smoothly varying diffusion tensor, Purser et al. (2003b) showed that a better approximation can be made by smoothing det( ) in the analytical expression of the normalization factor. In particular, their analysis suggests that det( ) should be smoothed with the square root of the diffusion operator that is used in the correlation operator. For a Gaussian filter, applying the square root of the diffusion operator is equivalent to applying the complete diffusion operator, but with the tensor diffusivity equal to one half of that used in the correlation operator. Rather than using precisely one half of the tensor diffusivity, Yaremchuk and Carrier (2012) suggest treating this factor as a free parameter that can be tuned empirically. Also, they apply the smoothing operator to the inverse of the analytical normalization factor (the filter variance), which is equivalent to smoothing the inverse of the square root of det( ). We found that this approach can produce numerical artifacts because the smoothed filter variances could attain excessively small values. Smoothing the analytical normalization factors directly (not their inverse) produces better results in our experiments. This is similar to what was proposed by Purser et al. (2003b), but here we smooth the square root of det( ) instead of det( ).
Taking into account both smoothing of the normalization factors and the Neumann boundary condition correction term, the approximate normalization factors for the 2D implicit diffusion operator are h is the 2D diffusion operator (without normalization) used in C h but with the tensor diffusivity given by where is a tunable parameter with 0 < < 1. Likewise, for the 3D implicit diffusion operator, the approximate normalization factors become wheren hz = ({̂2 hz } 1 , … , {̂2 hz } N ) T , L s hz is the 3D diffusion operator (without normalization) used in C hz , but with the tensor diffusivity reduced according to Equation 28.
For the 2D×1D formulation, we use a slightly different procedure in which the smoothing is performed separately on the horizontal and vertical tensor contributions to the analytical normalization factor: In terms of the mean error, the inclusion of the smoothing operator has a slightly negative impact on the accuracy of the normalization factors for all variables (Experiments 11 and 12) compared to the experiments with only the boundary correction (Experiments 9 and 10). For these experiments, has been set to 0.33 for the horizontal diffusion tensor and 0.5 for the vertical diffusion tensor, which are values suggested by the theoretical relation given by Yaremchuk and Carrier (2012) (their equation 12): where d is the dimension of the space. Values of between 0.2 and 0.5 have also been tested, but result in only modest changes to the mean error. Figure 13 shows that, near the upper boundary, the errors are still large as in Figure 12 (note that the scales of the vertical axes of these two figures are different). Switching the boundary condition from Neumann to NDz (ND) in the 2D×1D (3D) smoothing operator results in substantially reduced errors in this region as illustrated in Figure 14. The mean errors of the normalization factors for temperature and unbalanced salinity are reduced as well (Experiments 13 and 14 in Table 2) compared to the mean errors from the experiments without smoothing (Experiments 9 and 10) and with smoothing using the Neumann boundary condition (Experiments 11 and 12).

Bias correction
The approximate expressions for the normalization factors presented thus far in Section 4.5 are derived from purely theoretical considerations of the diffusion equation. As such, they do not account for effects on the normalization factors resulting from numerical approximations. One notable numerical approximation is the modest convergence criterion used with the implicit solver. On each diffusion step, an elliptic equation is solved approximately using a fixed number of iterations of the Chebyshev Iteration algorithm. The number of iterations is chosen such that the 2-norm of the residual is reduced by three to four orders of magnitude compared to the 2-norm of the right-hand side (Weaver et al., 2016;Weaver et al., 2018b). We can expect that the modest accuracy threshold of the solver will bias the normalization factors relative to those deduced from theory. We can attempt to estimate the bias in the normalization factors using linear regression. Let us assume that the normalization factor at each grid point can be described by the linear model̃2 wherê2 n is our "best" analytical estimate of the normalization factor, and a and b are adjustable constants. Let us also assume that the exact normalization factors 2 k are available at a selected number of p = 1, … , P grid points.

F I G U R E 13 Experiment 11.
The relative error in the normalization factors for the 2D×1D (3) implicit diffusion operator for temperature at approximately 35 • N when using the boundary correction term to account for the effect of the Neumann boundary condition and smoothing of the zeroth-order analytical normalization. The vertical section displays the upper 1,000 m F I G U R E 14 Experiment 13. As Figure 13, but when the diffusion filter used for smoothing the zeroth-order analytical normalization uses the NDz formulation We can use the exact method (Equation 18) to compute 2 p at the P points. Using the method of least squares, we can then seek a and b that minimize Carrying out this operation leads to expressions for a and b which, when substituted back in Equation 31, give the bias-corrected estimate of the normalization factors: where ( ⋅ ), var(⋅) and cov(⋅ , ⋅) denote sample estimates of the mean, variance and covariance of the quantities in parentheses, computed over the P points. In Experiment 15, the analytical normalization factors based on Equations (27) and (30) are bias-corrected using Equation (32). The cost of the method is dominated by the computation of the exact normalization factors at the regression points, especially for 3D variables. Here, we choose a sample size of 50 for all variables (2D and 3D), which provides a better sampling of normalization factors for unbalanced SSH than temperature and unbalanced salinity. In order to limit boundary effects, the regression points are selected from open areas of the main ocean F I G U R E 15 Experiment 15. The relative error in the normalization factors for the 2D×1D (3) implicit diffusion operator for temperature at approximately 35 • N when using bias-corrected analytical normalization factors (Equation (32)). The analytical normalization factors have been computed using Equation (30), which involves both a boundary correction term to account for the effect of the Neumann boundary condition and smoothing of the zeroth-order analytical normalization factors using the NDz formulation. basins (Pacific, Atlantic, and Indian Ocean basins, as well as the Circumpolar Current region). For temperature and unbalanced salinity, the points are chosen to sample the mixed layer (45 m) and main thermocline (165 m).
Bias correcting the analytical normalization factors leads to further reduction of the mean error for unbalanced SSH compared to all other experiments ( Table 2). The improvement is visible in most areas, although in some boundary areas (close to Antarctica and in northern Canada), the relative error is increased and changes sign (not shown). In these areas, it is likely that the bias is dominated by the effects of boundary geometry (not solver precision), which is difficult to capture with the simple regression model Equation (31) and is not sampled by the points used for regression in this experiment. Compared to Experiment 13, results for temperature produce a slight increase in the mean error, while those for unbalanced salinity show a more substantial increase, probably for the reasons already mentioned. Figure 15 shows modest improvement in the temperature normalization factors in the ocean interior (cf. Figure 14).

NORMALIZATION TO ACCOUNT FOR FLOW-DEPENDENT VERTICAL CORRELATIONS
Correct specification of the vertical correlations of background error in the upper ocean is important for properly assimilating surface observations such as Sea Surface Temperature (SST) and Sea Surface Salinity (SSS). Near the surface, the vertical correlations are sensitive to the mixed-layer depth which can exhibit large variations on time-scales comparable to the width of the assimilation window. Capturing flow dependence in the background-error vertical correlations is highly desirable. Laloyaux et al. (2018) illustrate this point in a coupled data assimilation framework.
Randomization is too costly to be used with the complete 2D×1D diffusion matrix to recompute normalization factors on each assimilation cycle. Furthermore, results from the previous section indicate that the analytical-based normalization methods are not sufficiently accurate. In this section, we investigate variants of the randomization procedure that would allow us to update the normalization factors when the vertical correlation parameters change from one cycle to the next but the horizontal correlation parameters are held fixed to their climatological values.

Look-up table: mixed-layer dependent vertical length-scales
One way to define flow-dependent background-error correlations, without the need for an ensemble, is to parametrize them in terms of the (flow-dependent) background state. This approach ignores uncertainty in the background state in defining the correlations. The Met Office applies this technique in NEMOVAR to define flow-dependent vertical correlation length-scales in the mixed-layer region. There, the vertical correlation length-scales are assumed to be equal to the mixed-layer depth itself (Waters et al., 2015), a parametrization that is supported by the ensemble-estimated vertical diffusion coefficients shown in Figure 3. Below the mixed-layer region, the vertical length-scales are taken to be proportional to the local vertical scale factor, with a smooth transition between the two parametrizations near the base of the mixed layer. Waters et al. (2015) describe an approximate method to estimate the normalization factors for this specific parametrization, which relies on a pre-computation of a look-up table of normalization factors. The look-up table is constructed by applying randomization with a large sample size to a set of diffusion models where each model uses a parametrized vertical diffusion tensor with a different mixed-layer depth. In constructing the look-up table, the mixed-layer depth is taken to be the same at all horizontal grid points. On each assimilation cycle, the normalization factors at each grid point are then selected from the look-up table according to the value of the mixed-layer depth at that grid point. The method is approximate since the look-up table is constructed assuming a constant mixed-layer depth whereas the mixed-layer depths in the background state will vary between horizontal grid points. In order to reduce the error in this approximation, the mixed-layer depths are smoothed in each level using several iterations of a Shapiro filter.
In principle, we could apply a similar technique when using flow-dependent vertical correlations estimated from an ensemble, where normalization factors at different levels are computed for a range of typical vertical length-scales. However, the requirement that the horizontal variations of the vertical length-scales be sufficiently smooth for this to be an accurate approximation is a significant drawback, as is the system overhead of having to manipulate and access information from many large files that constitute the look-up table. The alternative methods described in the next subsection could also be applied to the parametrization described in this subsection.

2D×1D diffusion with accurate normalization using randomization
The computational cost of applying the vertical diffusion operator in V z is significantly cheaper than applying the horizontal diffusion operator in V h . For the global configuration used here, the difference in wall-clock time is more than a factor of ten when using a massively parallel domain decomposition with 12×12 horizontal subdomains (run on four nodes × 36 processors per node) and Message Passing Interface (MPI) communications. There are no MPI communications required when applying V z as each subdomain contains the entire water column. On the other hand, local MPI communications are required before each diffusion step in V h , to update the halos on the lateral boundaries that connect adjacent subdomains.
We can exploit the difference in cost between V z and V h to reduce the cost of randomization for the specific formulation corresponding to m = 2 in Equation (11) (2D×1D (2)). The random vectors are generated using Equation (20): wherêq = V h W −1∕2 hz q and q = 1, … , Q (with Q large). We have added the subscript f to V f z to emphasize that the vertical diffusion operator, unlike V h , is flow-dependent. From the order of the operations in Equation (33), it is clear that the static vectorŝq, associated with the costly horizontal diffusion operator, need only be computed once as they can be reused as the input for V f z when it is updated. This means that the cost of randomization on each cycle is determined by V f z , which is affordable even with a very large sample size (Q ∼ 10 4 ) to obtain accurate normalization factors.
Nevertheless, there are two drawbacks with this approach. First, in order to produce accurate normalization factors, a large number of random vectorŝq must be precomputed and stored to file for each 3D control variable, and then retrieved on each assimilation cycle. With 10 4 random vectors, two 3D control variables (temperature and unbalanced salinity) and storage in single precision, this amounts to around 350 gigabytes of file-space that is required for ORCA1-Z42. With ORCA025-Z75, the required file-space is about 10 terabytes, which can be further reduced to about 5 terabytes using NetCDF compression tools. There is clearly a penalty to be paid when having to manipulate files of this size in an operational suite, even though it is generally feasible with modern file systems. Another drawback is that we are forced to sacrifice interleaving in order isolate V f z in Equation (33). The impact of interleaving will be assessed in Section 5.4.

Separate randomization for 1D and 2D diffusion
Another approach is to assume that the normalization matrix for any of the 2D×1D and 3D formulations can be specified as a product of a normalization matrix for the horizontal diffusion component and a normalization matrix for the vertical diffusion component where each matrix is estimated separately using randomization. As with the previous approach, this would allow us to update only the normalization factors for the time-evolving vertical component to reduce computational cost. However, F I G U R E 16 Experiment 17.
The relative error in the normalization factors for the 2D×1D (3) implicit diffusion operator for temperature in the surface level when using normalization factors approximated as a product of factors estimated separately for the horizontal and vertical diffusion components using randomization with a sample size of 10 4 . Note that the error scale is a factor of 10 smaller than the one used in the corresponding error plots for the analytical estimates unlike the previous approach, there is no need to store a large number of random vectors, which is clearly an advantage. The purpose here is to evaluate the accuracy of this approximation.
Using the notation from Section 2.2.2, we can formalize the estimation procedure as follows. First, we separate the randomization algorithm into two steps: one for computing random vectors h q using the horizontal diffusion operator, and one for computing random vectors z q using the vertical diffusion operator: The estimates of the normalization factors for the horizontal and vertical diffusion components are 2 h =̃2 h W z and 2 z =̃2 z W h wherẽ2 h and̃2 z are the sample estimates built from h q and z q , respectively. The term W z in 2 h compensates for the vertical scale factors in W hz in Equation (34), while the term W h in 2 z compensates for the horizontal scale factors in W hz . These terms are necessary to avoid double-counting of the volume elements. The product 2 h 2 z =̃2 h̃2 z W hz ≈ 2 h×z then provides an estimate of the normalization matrix for C h × z (Equation (10)). Figures 16 and 17 show that this procedure leads to remarkably accurate estimates, except in a few isolated areas near coastlines and bottom bathymetry. Furthermore, the mean errors for both temperature and unbalanced salinity are notably smaller for the 2D×1D formulation with interleaving than without interleaving (4% compared to 7 to 8%). For the latter, the largest errors are also confined to coastlines and bottom bathymetry, but are more widespread than those obtained with interleaving (not shown). This is a positive result as interleaving is an important feature that we wish to retain, as illustrated in the next subsection.

Correlation structures
Up until now, we have focused on evaluating the accuracy of the amplitude of the correlation functions for different formulations of the correlation matrix. In this subsection, we examine the correlation structures themselves. All formulations produce similar structures in the open ocean (not shown). Near bathymetry, however, they can produce quite different structures as illustrated in Figure 18. Here, we have applied different formulations of the correlation operator to a delta function located at the tip of a sea mount in the Northeast Pacific Ocean (coordinates (i, k) = (99, 25)). It is worth highlighting a number of features. First, the 2D×1D formulation with interleaving produces very similar results to the non-separable 3D formulation (cf. Figure 18a and b), but is substantially cheaper. The wall-clock time required to apply the 2D×1D diffusion operator is 20% less when using a 12×12 decomposition run on 4 nodes × 36 processors/node. The 3D formulation would have advantages over the 2D×1D formulation if rotational anisotropy between the horizontal and vertical directions is accounted for in the diffusion tensor, which is currently not the case. Second, the 2D×1D formulation without interleaving (middle left panel) produces unphysical correlations on either side of the sea mount, with the correlations one level up from the source point (points (98, 24) and (100, 24)) being distinctly F I G U R E 17 Experiment 17. The relative error in the normalization factors for the 2D×1D (3) implicit diffusion operator for temperature at approximately 35 • N when using normalization factors approximated as a product of factors estimated separately for the horizontal and vertical diffusion components using randomization with a sample size of 10 4 larger than those directly adjacent to the source point (points (98, 25) and (100, 25)). Interestingly, this artifact is corrected using approximate normalization (Figure 18d). The correlation structures produced with interleaving and approximate normalization are physically sensible but somewhat underestimated compared to those with accurate normalization (Figure 18e, f).
Next, we examine the sensitivity of the correlations produced near the surface to the upper boundary condition. Figure 19 illustrates the response of the 2D×1D correlation operator to a delta function applied in the uppermost model level. The large vertical extent of the correlations throughout the mixed-layer region is clearly visible. Figure 19a, b compare the correlations produced with Neumann boundary conditions (a) and the ND formulation using Equation (25) where it is applied with the vertical diffusion operator only (NDz; (b)). In both cases, the normalization has been computed accurately. The correlations with Neumann boundary conditions penetrate deeper than those with the NDz formulation. This is an undesirable consequence of normalization, which corrects the amplitude of the correlation function at the expense of affecting the shape of the correlation function. This effect was illustrated by Mirouze and Weaver (2010) in a simple 1D framework (their figure 8). The NDz formulation prevents this behaviour but doubles the cost of the correlation operator. However, this extra cost may be warranted given the importance of the vertical correlations near the surface for correctly assimilating surface observations. Finally, the NDz formulation with approximate normalization does not change the spatial structure of the correlations compared to those obtained with accurate normalization, but does tend to overestimate them at most grid points (Figure 19c,d). The mean errors for temperature and unbalanced salinity are slightly larger using approximate normalization with the NDz formulation than with the standard formulation that employs Neumman boundary conditions (cf. Experiments 17 and 18 in Tables 1 and 2).

SUMMARY AND DISCUSSION
Accounting for flow-dependent background-error correlations with a diffusion operator requires an efficient method to estimate the diagonal of the diffusion-based covariance matrix in order to compute normalization factors. In this article, we have described several general techniques for this purpose. We have evaluated them in a global ocean configuration of the NEMOVAR data assimilation system. The principal parameters of the diffusion operator are the elements of the diffusion tensor, which are related to the spatial length-scales of the underlying covariance functions. These are key parameters that largely influence the amplitude and spatial structure of the normalization factors. Here, they have been estimated from a climatological ensemble of perturbations from ECMWF's low-resolution ocean reanalysis ORAS5-LR.
In the first part of the article, we evaluated inexpensive methods derived from analytical considerations of the diffusion equation or based on randomization with a small sample size. We can summarize the main results from those experiments as follows.
• The normalization factors estimated from the analytically-based methods can have significant errors and are generally not accurate enough for operational applications with global ocean models. Randomization estimates with a small sample (∼100) are more accurate in a globally averaged sense but are too noisy and can attain large errors locally.
• The analytically-based methods are useful for quick testing of developments to the correlation model where high accuracy is generally not required. Among the different approximations that were tested with the ensemble-derived tensor, overall best results were obtained using smoothed versions of the analytical expressions of the normalization factors together with a correction term near boundaries to compensate for the amplification effect of the Neumann boundary condition (Equations (27), (29) and (30)). • The main limitation of the analytically-based methods for global ocean data assimilation is their inability to produce sufficiently accurate normalization factors near land boundaries, especially where the boundary geometry is complex. The more costly ND formulation of the correlation model, involving a combination of Neumann and Dirichlet boundary conditions, provided some improvement but not enough to justify the additional computational cost that the ND formulation entails. The ND formulation is most effective near the upper boundary in the open ocean where the assumptions underlying the method are best satisfied.
• Coastlines and bathymetry are also problematic when using the diffusion operator as a filter for either suppressing sampling error from randomization estimates or for smoothing the diffusion tensor in the analytical approximation of the normalization factors. Further research aimed at improving the filter response near irregular land boundaries is desirable.
• Results using linear regression to bias correct the analytical normalization factors were positive for unbalanced SSH but somewhat mixed for temperature and unbalanced salinity. This is perhaps not surprising given the simplicity of the regression model that was used, especially for 3D variables. There is clearly scope for using more sophisticated regression models, which could include, for example, predictors based on the elements of the diffusion tensor and their derivatives. This approach has been tested in an idealized 2D configuration, with positive results. Additional predictors to handle boundary geometry and possibly other factors would be needed to apply this method to global ocean configurations. In general, the problem lends itself to a machine learning strategy whereby a regression model is trained using a set of accurate normalization factors (computed using randomization) for different diffusion tensors estimated from an ensemble.
• The normalization methods described in this article are likely to provide better accuracy for diffusion-based covariance operators in atmospheric models where there are no complicating effects from irregular land boundaries.
Based on the above results, we conclude that, for global ocean applications, only randomization with a large sample (on the order of 10 4 ) provides a robust method for accurate normalization, albeit at a substantial cost especially with 3D diffusion operators. This clearly has implications on the degree of flow dependence that can be affordably accounted for by a background-error correlation model in an operational high-resolution global data assimilation system for which there are tight constraints on computer resources and on timeliness for the delivery of analyses. A reasonable compromise could be to specify the background-error correlations from a seasonal climatology of ensemble perturbations, which could be updated occasionally to capture the slow evolution of background error associated with, for example, changes in the observing system.
In the second part of the article, we examined methods for updating normalization factors more frequently for the special case where only the vertical component of the diffusion operator is made fully flow-dependent. Capturing flow dependence in the vertical correlations of background error is important for correctly assimilating surface observations such as SST and SSS which are strongly influenced by surface mixed-layer processes. With a moderate number of ensemble members, we can use the sample covariance of the vertical derivative of the ensemble perturbations to produce a relatively robust statistical estimate of the vertical correlation length-scales on each cycle.
We showed that normalization factors can be estimated with good accuracy by approximating them at each grid point as a product of two normalization factors: one computed using randomization with the horizontal diffusion operator only and the other computed using randomization with the vertical diffusion operator only. With this approximation, the mean relative error was only 4%, with the largest errors being concentrated in a few isolated regions near bottom bathymetry and coastlines. The magnitude of the errors is likely well within the uncertainty of our best estimates of the true background-error variances. This result is of significant practical interest as the vertical diffusion operator is inexpensive and thus can be applied on each cycle with a large sample of random vectors to obtain an accurate estimate of the normalization factors for the vertical component. These factors can then be combined with the normalization factors for the static horizontal component to obtain a good approximation of the complete normalization matrix. Recent experiments indicate that this approximation also works well with the higher-resolution configuration ORCA025-Z75.