Resolution of sharp fronts in the presence of model error in variational data assimilation

Authors


Abstract

We show that the four-dimensional variational data assimilation method (4DVar) can be interpreted as a form of Tikhonov regularization, a very familiar method for solving ill-posed inverse problems. It is known from image restoration problems that L1-norm penalty regularization recovers sharp edges in the image more accurately than Tikhonov, or L2-norm, penalty regularization. We apply this idea from stationary inverse problems to 4DVar, a dynamical inverse problem, and give examples for an L1-norm penalty approach and a mixed total variation (TV) L1L2-norm penalty approach. For problems with model error where sharp fronts are present and the background and observation error covariances are known, the mixed TV L1L2-norm penalty performs better than either the L1-norm method or the strong-constraint 4DVar (L2-norm) method. A strength of the mixed TV L1L2-norm regularization is that in the case where a simplified form of the background error covariance matrix is used it produces a much more accurate analysis than 4DVar. The method thus has the potential in numerical weather prediction to overcome operational problems with poorly tuned background error covariance matrices. Copyright © 2012 Royal Meteorological Society

1. Introduction

Data assimilation is a method for combining model forecast data with observational data in order to forecast more accurately the state of a system. One of the most popular data assimilation methods used in modern numerical weather prediction (NWP) is four-dimensional variational data assimilation (4DVar) (Sasaki, 1970; Talagrand, 1981; Lewis et al., 2006), which seeks initial conditions such that the forecast best fits both the observations and the background state (which is usually obtained from the previous forecast) within an interval called the assimilation window. Currently, in most operational weather centres, systems and states of dimension equation image or higher are considered, whereas there are considerably fewer observations, usually equation image (for reviews on data assimilation methods see Daley, 1991; Nichols, 2010).

Linearized 4DVar can be shown to be equivalent to Tikhonov, or L2-norm regularization, a well-known method for solving ill-posed problems (Johnson et al., 2005). Such problems appear in a wide range of applications (Engl et al., 1996) such as geosciences and image restoration: the process of estimating an original image from a given blurred image. From the latter work it is known that by replacing the L2-norm penalty term with an L1-norm penalty function, image restoration becomes edge-preserving as the process does not penalize the edges of the image. The L1-norm penalty regularization then recovers sharp edges in the image more precisely than the L2-norm penalty regularization (Hansen, 1998; Hansen et al., 2006). Edges in images lead to outliers in the regularization term and hence L1-norms for the regularization terms give a better result in image restoration. This is the motivation behind our approach for variational data assimilation.

The edge-preserving property of L1-norm regularization can be used for models that develop shocks, which is the case for moving weather fronts. In NWP and ocean forecasting, it is recognized that the 4DVar assimilation method may not give a good analysis where there is a sharp gradient in the flow, such as a front (Bennett, 2002; Lorenc, 1981). If the front is displaced in the background estimate, then the assimilation algorithm may smear the front and also underestimate the true amplitude of the shock (Johnson, 2003). In these cases the error covariances propagated implicitly by 4DVar are not representative of the correct error structures near the front. If model error is present, then there are systematic errors between the incorrect model trajectories and the observed data and therefore the strong constraint 4DVar, which assumes a perfect model, is not able to represent these errors correctly. Here we apply an L1-norm penalty approach to several numerical examples containing sharp fronts for cases with model error. We show that the L1-norm penalty approach applied to the gradient of the analysis vector (we call this mixed total variation (TV) L1L2-norm penalty regularization) performs better than the standard L2-norm regularization in 4DVar. With the use of the gradient operator and the L1 norm, localization of the gradient is enforced, which is important in tracking fronts. As an example we use the linear advection equation where sharp fronts and shocks are present. We use a numerical scheme that introduces some form of model error into the systems and find that, using an L1-norm regularization term, applied to the gradient of the solution, fronts are resolved more accurately than with the standard L2-norm regularization of 4DVar. Further investigation remains to be done in order to evaluate the technique in an operational setting.

Section 2 gives an introduction to 4DVar and shows its relation to Tikhonov regularization. In section 3 we introduce the new algorithms and in section 4 we explain how we solve the L1-norm regularization problem and the mixed TV L1L2-norm regularization problem. Sections 5 and 6 describe experiments using a linear advection model where the new regularization approaches are compared with standard 4DVar for cases with model error. Under these conditions it is seen that mixed TV L1L2-norm regularization outperforms 4DVar where sharp fronts are present. In the final section we present conclusions and discuss future work.

2. 4DVar and its relation to Tikhonov regularization

In nonlinear 4DVar we aim to minimize the objective function

equation image(1)

subject to the system equations

equation image(2)

This is a nonlinear constraint minimization problem where the first term in (1) is called the background term, equation image is the background state at time t = 0 and equation image are the state vectors at time ti. The function equation image denotes the nonlinear model that evolves the state vector xi at time ti to the state vector xi+1 at time ti+1. In weather forecasting the state vector equation image is the best estimate of the state of the system at the start of the window from the previous assimilation/forecast cycle. The vectors equation image contain the observations at times ti and equation image is the observation operator that maps the model state space to the observation space.

Minimizing (1) is a weighted nonlinear least-squares problem. By minimizing equation image we find an initial state equation image, known as the analysis, such that the model trajectory is close to the background trajectory and to the observations in a suitable norm. The symmetric matrix equation image and the symmetric matrices equation image are assumed to represent the covariance matrices of the errors in the background and the observations, respectively. The matrices Ri describe the combined effects of measurement errors, representativity errors (arising from the need to interpolate state vectors to the times and locations of the observations) and errors in the observation operator. Provided the background and observation errors have Gaussian distributions with mean zero, then minimizing equation image is equivalent to finding the maximum a posteriori Bayesian estimate of the true initial condition (Lorenc, 1986).

We apply a Gauß–Newton method (Dennis and Schnabel, 1983) in order to solve the minimization problem (1). From a starting guess equation image, Newton's method for solving the gradient equation is

equation image(3)

for k ≥ 0. In the Gauß–Newton method, the Hessian is replaced by an approximate Hessian equation image that neglects all the terms involving second derivatives of ℳi+1,i and ℋi. We let Mi+1,i be the Jacobian of ℳi+1,i. Here we only consider problems where the observation operator is linear, i.e. ℋi(xi) = Hixi. Furthermore, both Ri = R and Hi = H are assumed to be unchanged over time.

The gradient of (1) is then given by

equation image(4)

where Mi,0(x0) is the Jacobian of ℳi,0(x0). The chain rule gives

equation image(5)

Taking the gradient of (4) and neglecting terms involving the gradient of Mi,0(x0) gives

equation image(6)

Both the summation terms in (4) and (6) can be obtained recursively using the adjoint equations

equation image

for i = N,…,1, in order to find the gradient

equation image(7)

and similarly

equation image

for i = N,…,1, leads to

equation image(8)

Using these adjoint equations we avoid having to compute Mi,i−1(xi−1) several times. We note that λi, i = 0,…,N are vectors whereas ∇λi, i = 0,…,N are square matrices of the dimension of the system state.

The approximate Hessian equation image and equation image are then used in (3), which is equivalent to a linearized least-squares problem. Here we solve this system directly. This approach is mathematically equivalent to the incremental 4DVar method as described in Lawless et al. (2005a, 2005b); in the incremental method, however, the inner equations (3) are solved iteratively.

We may rewrite the objective function (1) in 4DVar as

equation image(9)

where

equation image

In general, Ĥ(x0) is a nonlinear operator, equation image is a vector and equation image is a block diagonal matrix with diagonal blocks equal to R. If we linearize ℳi,0 about equation image, then the Jacobian of the augmented matrix Ĥ is given by

equation image(10)

which is essentially the observability matrix. Now writing equation image and equation image, where CB and CR denote correlation matrices, and performing a variable transform equation image, we may write the linearized objective function that we aim to minimize as

equation image(11)

This is equivalent to a linear least-squares problem with Tikhonov regularization (Engl et al., 1996), where μ2 acts as the regularization parameter. If we set

equation image(12)

where equation image and equation image, then (11) may be written as

equation image(13)

If G is an ill-posed operator, or in the discrete setting an ill-conditioned matrix, then the minimization problem

equation image(14)

is hard to solve exactly, i.e. the solution z does not continuously depend on the data. In data assimilation the matrix G is generally ill-conditioned, which means it has singular values that decay rapidly and many are very small or even zero. This problem occurs if there are not enough observations in the system, which is typical for NWP. Furthermore, the given observations are subject to errors, leading to errors in the vector f. Hence we can see that the minimization problem (14) with an ill-conditioned system matrix G and an unreliable data vector f will lead to an unstable solution, and some form of regularization is required (e.g. preconditioning, Tikhonov regularization, singular value filtering). We consider Tikhonov regularization where a regularization term equation image is introduced, which leads to the objective function Ĵ2(z) in (13). The minimization of the Tikhonov function (13) gives the regularized solution

equation image(15)

where equation image is the identity matrix (see, for example, Hansen et al., 2006, Ch. 5, for details). The vectors uj and vj are the singular vectors of G belonging to the singular values σj, where G has the singular value decomposition G = UΣVT, with equation image and equation image being orthonormal matrices and Σ being a pN × m matrix with entries σj ≥ 0, j = 1,…,min(pN,m), on the leading diagonal and zeros elsewhere. Hence the factor equation image acts as a filter factor for small singular values σj.

It is known from image processing (Hansen et al., 2006) that instead of taking the L2-norm for the regularization term equation image (i.e. the background term) the L1-norm gives a better performance when sharp edges need to be recovered. The reason for the edge-preserving property of the L1-norm is that the L1-norm enforces a sparse solution (Donoho, 2006a). 4DVar performs poorly for the recovery of fronts. For shocks and fronts in the form of square waves or step functions, as in Figure 1 and all the following figures, the gradient of the solution is sparse and hence we introduce a mixed total variation L1L2-norm approach which aims to recover fronts and sparse solutions (see Wright et al., 2009). In general, the gradient would be small (but non-zero) away from the front, but L1 methods, which recover solutions with sparse gradients, should work well.

Figure 1.

Results for 4DVar applied to the linear advection equation where the initial condition is a square wave. We take imperfect observations every 20 points in space and every 2 time steps. 4DVar leads to bad oscillations in the initial condition and also to a phase error in the forecast.

Hence we introduce and test two new approaches which are motivated by the L1-norm regularization and compare them to standard 4DVar: these are L1-norm regularization and a mixed total variation L1L2-norm regularization. Both are described in the next section.

3. L1-norm and mixed L1L2-norm regularization

With the notation in (12), the minimization problem in (11) can be written as (13) – known as standard Tikhonov regularization – where the second term is a regularization term and μ2 is the regularization parameter. In the literature there has been a growing interest in using L1-norm regularization for image restoration (see, for example, Fu et al., 2006; Agarwal et al., 2007; Schmidt et al., 2007).

Firstly, in this paper we consider the effects of L1-norm regularization for variational data assimilation by replacing the squared L2-norm in the regularization term equation image of (13) by the L1-norm to obtain

equation image(16)

Equation (13) can be written as

equation image(17)

The minimization problems (16) and (19) aim to produce a solution z and hence, with equation image, an initial state equation image such that the solution trajectory is both close to the background (the previous forecast) and the observations in some weighted norm. The solution to problem (16) promotes sparsity in the solution; hence it promotes a sparse vector z. It has been shown that with very high probability the vector z if minimized in the L1-norm has very few entries (for further mathematical details of sparsity promoting minimization we refer to Donoho (2006b). We will see that this is generally not so useful for our computations.

Both the L2-norm and the L1-norm minimization can be interpreted from a Bayesian point of view. For the L2-norm approach – which is equivalent to standard 4DVar – a Gaussian distribution is assumed for the error in the prior, i.e. for the background error. For the L1-norm, the background error is assumed to have a Laplace (double-sided exponential) distribution. (For details, see the Appendix.)

The advantage of using the L1-norm is that the solution is more robust to outliers. It has been observed that a small number of outliers have less influence on the solution (Fu et al., 2006). Edges in images lead to outliers in the regularization term and hence L1-norms for the regularization terms give a better result in image restoration. This is the motivation behind our approach for variational data assimilation.

However, if it is known that fronts are present in the solution then the gradient of the solution will be sparse – hence the gradient of the initial state x0 will be sparse. If we approximate the gradient by a matrix D given by

equation image(18)

then the minimization problem for a sparse initial state and hence a sharp front becomes

equation image(19)

where equation image, D is given by (18) and δ is another so-called regularization parameter which needs to be chosen. The size of δ determines how much sparsity is enforced on the gradient of the solution (see Table 1 for examples with different choices of δ). We will see in section 5 that minimizing ĴTV(z) in (19) gives a much better resolution of the fronts than minimizing Ĵ2(z) or Ĵ1(z) in (19) or (16). (We remark that other choices for the derivative approximation D can be taken, but (18) is commonly used.)

Table 1. Comparison between errors in the analysis in standard 4DVar, L1-norm regularization and mixed TV L1L2-norm regularization measured in the L2-norm. Here the length-scale L = 5.
  Standard 4DVarL1-norm regularizationmixed TV L1L2-norm regularization
    δ = 10δ = 100δ = 1000
B = I Full perfect observations2.36742.43921.15850.76740.2998
 Partial perfect observations12.803913.65989.36210.46432.7286
 Partial imperfect observations13.618214.43897.71280.47902.9110
B = 0.01I Full perfect observations1.06091.47800.89630.69980.2531
 Partial perfect observations1.379110.05891.09350.28661.2440
 Partial imperfect observations1.46149.90831.00600.17191.3910
B = 0.005I Full perfect observations0.90121.4567079870.64170.2272
 Partial perfect observations0.86519.35470.68870.22600.8014
 Partial imperfect observations0.89798.52960.65660.15000.9141
B with entriesFull perfect observations1.18921.37030.98010.73910.2807
equation imagePartial perfect observations2.784511.66472.24210.38322.7031
 Partial imperfect observations3.104111.11332.27800.55522.8524
B with entriesFull perfect observations0.49211.01840.48570.43460.1696
equation imagePartial perfect observations0.31502.06670.29380.16330.9128
 Partial imperfect observations0.41611.54000.39970.30570.8456
B with entriesFull perfect observations0.40230.93960.39810.36360.1567
equation imagePartial perfect observations0.23040.63270.21710.14550.6922
 Partial imperfect observations0.32250.54890.31390.26800.5686
B = I andFull perfect observations2.15952.18580.58120.34060.6591
smaller length ofPartial perfect observations8.07738.21331.32010.53273.7108
assimilation windowPartial imperfect observations11.248711.42581.60750.61213.6611
B = 0.01I andFull perfect observations0.68810.99630.41300.19960.4832
smaller length ofPartial perfect observations0.94411.70470.61820.21291.6974
assimilation windowPartial imperfect observations1.20172.55800.79710.17952.7750
B = 0.005I andFull perfect observations0.54630.83780.36770.15530.3939
smaller length ofPartial perfect observations0.68091.49380.49030.17951.0246
assimilation windowPartial imperfect observations0.82932.04890.61320.15101.1469
equation imageFull perfect observations0.88421.03690.52100.27250.6112
and smaller length ofPartial perfect observations1.22001.59080.79740.37843.6971
assimilation windowPartial imperfect observations1.70782.68821.04450.46553.6392
equation imageFull perfect observations0.22560.28780.21660.15580.3266
and smaller length ofPartial perfect observations0.46880.59480.45330.30000.4088
assimilation windowPartial imperfect observations0.33660.47900.31890.28641.1626
equation imageFull perfect observations0.19590.22040.19130.15110.2443
and smaller length ofPartial perfect observations0.39440.48870.38110.27820.9113
assimilation windowpartial imperfect observations0.27700.37990.26860.26910.8676

We find that, for fronts and shocks, regularization with an added L1-norm on the derivative of the initial condition in 4DVar gives much better results than the standard L2-norm approach in the presence of model error. When an L1-norm penalty term with a gradient as in (19) is added, one often speaks of TV regularization (Strong and Chan, 2003). We call the problem in (19) a mixed TV L1L2-norm regularization problem.

In the following section we explain how we solve the L1-norm minimization problem in (16) and the mixed TV L1L2-norm minimization problem in (19).

4. Least mixed norm solutions

Consider the minimization problems (16) and (19). In order to solve these least mixed norm solutions we use an approach introduced by Fu et al.(2006). Both problems (16) and (19) are solved in a similar way. We explain the algorithm using the minimization problem (19); the application of the algorithm to problem (16) is similar.

First, with equation image, problem (19) can be formulated as

equation image(20)

We let

equation image

and split v into its non-negative and non-positive parts v+ and v; i.e.

equation image

and

equation image

Problem (20) can then be written as

equation image(21)

subject to the constraints

equation image(22)
equation image(23)

Here 1 denotes the vector of all ones of appropriate size. This problem can then be written as

equation image(24)

subject to

equation image(25)

where

equation image
equation image
equation image

and the block matrices I and 0 as well as the vectors 1 of all ones in the matrices H, E, F and c are of appropriate size. The objective function in (24) is convex as H is symmetric positive semi-definite. In order to solve the quadratic programming problem (24) with constraints (25) we use the Matlab in-built function quadprog.m, which readily solves problems of the form (24), (25). For our problem we use an active-set quadratic programming strategy (also known as a projection method), which is described in Gill et al.(1981). For details on the implementation of the Matlab quadratic programming tool we refer to the Matlab product documentation Matlab (2012).

In the following section we consider a square wave advected using the linear advection equation as an example. We use a ‘true’ model (from which we take the observations) and another model, which is different from the truth and hence introduces a model error. The different models we use are introduced in the next section. In all examples we observe that the new edge-preserving mixed TV L1L2-norm regularization indeed gives better results than the standard L2-norm approach and the simple L1-norm regularization.

In all the examples we keep the regularization parameter μ fixed, as we are only investigating the influence of the norm in the regularization term, but not the size of the regularization parameter μ.

5. Numerical experiments

We consider the linear advection equation

equation image(26)

on the interval x ∈ [0,1], with periodic boundary conditions. We discretize the equation using the upwind scheme

equation image(27)

where j = 1,…,N, and the CFL condition Δt < Δx needs to be satisfied for stability (for details see LeVeque, 1992; Morton and Mayers, 2005). Note that the lower index in (27) is spatial and the upper index is temporal.

The initial solution is a square wave defined by

equation image(28)

This wave moves through the time interval; the true solution is obtained by the method of characteristics (by advecting the initial condition at speed 1, i.e. u(x,t) = u(xt,0)). The model equations are defined by the upwind scheme (27) with boundary conditions equation image, where n = 1,…,80, N = 100, equation image and n is the number of time steps. The same example is used in Griffith and Nichols (2000). For this example we take Δt = 0.005.

5.1. A standard experiment

We consider an assimilation window of length 40 time steps. After the assimilation period we compute the forecast for another 40 time steps, and hence 80 time steps are considered in total. For the background and observation error covariance matrices we take B = 0.01I and R = 0.01I; hence the background is given the same weight as a single observation in this case. Moreover, we choose the background to be equal to the truth given by (28) perturbed by Gaussian noise with mean zero and covariance B. The background thus contains errors with variance of order 0.01. We test several cases.

  • 1.Perfect observations are taken everywhere in time and space.
  • 2.Perfect observations are taken every 20 points in space and every 2 time steps.
  • 3.Imperfect observations are taken every 20 points in space and every 2 time steps; for the observations we introduce Gaussian noise with mean zero and variance 0.01.

For all cases we test

  • standard 4DVar (minimization problem (19));

  • L1-norm regularization (minimization problem (16)); and

  • mixed TV L1L2-norm regularization (minimization problem (19)).

Figures 1–3 show the results for this example. We only present the case for imperfect noisy observations, as it is the most realistic one. In the next subsection we consider a non-diagonal background error covariance matrix, for which we present the results for cases (1)–(3). We also note that a summary of results is presented in Table 1.

Figure 2.

Results for L1 regularization for the same data as in Figure 1.

Figure 3.

Results for mixed TV L1L2-norm regularization for the same data as in Figure 1. Mixed TV L1L2-norm regularization gives the best possible result for the initial condition.

In the plots the true solution is represented by a thick dot-dashed line (called ‘Truth’ in the legend). This true solution is unknown in practice. We take (noisy) observations by perturbing the true trajectory using zero-mean Gaussian noise. The model solution (which is derived from the upwind method) is shown as a dashed line (called ‘Imperfect model’ in the legend). This solution represents the model solution, i.e. the solution that is obtained if we use the correct initial conditions and the (imperfect) model. It represents the best solution that we are able to achieve (if data assimilation gives us the perfect initial condition), as the model error is always present. The solution obtained from the assimilation process by incorporating the (perfect/partial/noisy) observations is given by the solid line (called ‘Final solution’ in the legend).

The result for 4DVar is shown in Figure 1 (minimization problem (19)), that for L1-regularization in Figure 2 (minimization problem (16)) and that for mixed TV L1L2-norm regularization in Figure3 (minimization problem (24)). The analysis obtained by 4DVar and L1-regularization is very inaccurate, with many oscillations and large over/undershoots near the discontinuities (first plots in Figures 1 and 2). When L1-norm regularization with the gradient (mixed TV L1L2-norm regularization) is used, the initial condition is more accurate (first plot in Figure 3). The same results hold for full and partial perfect observations. The second row (B = 0.01I) of Table 1 quantifies the errors in the analysis for this situation for 4DVar, L1-norm regularization and the L1-norm total variation approach. We see that for all types of observations we investigated (partial, full, perfect and noisy observations), mixed TV L1L2-norm regularization gives the smallest initial condition error.

Traditional strong constraint 4DVar does not take model error into account. Hence 4DVar's attempts to compensate for the initial condition error are obstructed by the use of an imperfect forecast model and it therefore does not produce an accurate estimate of the truth at the initial time. The errors in the initial state estimated by 4DVar act to force the trajectory propagated by the incorrect model to match the observed data from the true model and hence act to compensate, on average, for the model error. From the final plot in Figure 1 for 4DVar we also see that the forecast is inaccurate, due to the incorrect estimate produced at the end of the assimilation window. We also observe that the forecast in 4DVar leads to a slight phase shift and the wrong amplitude in the forecast, as well as overshooting and undershooting. This behaviour is improved for mixed TV L1L2-norm regularization (see Figure 3). We see from the first plot of Figure 3 that the initial condition obtained from mixed TV L1L2-norm regularization is the most accurate and hence the best possible forecast (see final plot of Figure 3) is obtained (subject to model error). This behaviour is due to the property of mixed TV L1L2-norm regularization enforcing sparsity on the gradient of the solution.

The results shown in Figure 1 demonstrate a worst-case scenario for 4DVar, where there is no smoothing of the noisy analysis due to the use of the simple diagonal covariance matrix B. It is interesting to note, however, that, despite the lack of smoothing, the mixed TV L1L2-norm regularization method (Figure 3) successfully eliminates oscillations in the analysis. From our experiments, it emerges that this is characteristic of this regularization technique.

In the following subsections we change the experimental design in order to check the robustness of the regularizations. A more realistic matrix B is introduced in subsection 5.2 and used in the following experiments. In subsection 5.3 we investigate a change in the size of the assimilation window and in subsection 5.4 we summarize the results from the different experimental designs. In section 6 we assess the influence of 4DVar and mixed TV L1L2-norm regularization on systems with different a priori background information.

5.2. Changing the background error covariance matrix

We take precisely the same experiments as in the previous subsection (5.1); however, we change the background error covariance matrix from the identity matrix to an exponential covariance matrix B with entries

equation image(29)

and equation image. Hence B is a symmetric matrix with diagonal entries equal to 0.01 and off-diagonal entries that decay exponentially. This background error covariance matrix spreads the information from the observations more adequately and the error variance is still 0.01. Note that for this matrix the inverse is a tridiagonal matrix. For the background we choose Gaussian noise with covariance B and a mean value which is given by the truth. These errors are consistent with the choice of B.

We present the results for perfect, partial and imperfect partial observations (cases (1)–(3) in the description of the standard experiment in the previous subsection). Further cases are summarized in Table 1. We also do not present the results for L1-norm regularization here as we have seen in subsection 5.1 that this approach is not better than standard 4DVar. The more interesting case is the mixed TV L1L2-norm regularization.

Figures 4 and 5 show the results for perfect observations where the background error covariance matrix B is given by (29). Mixed TV L1L2-norm regularization (Figure 5) behaves consistently better than standard L2-norm regularization (Figure 4). In particular, the shape of the wave is distorted and there are small undershoots and overshoots in the 4DVar analysis (first plot in Figure 4), which lead to small errors and the wrong amplitude in the forecast (final plot in Figure 4). For the analysis using mixed TV L1L2-norm regularization, the initial condition (first plot in Figure 5) shows a smaller error than the initial condition in standard 4DVar (first plot in Figure 4) and the forecast is slightly better than the forecast in 4DVar.

Figure 4.

Results for 4DVar. We take perfect observations at each point in time and space over the assimilation interval, which is 40 time steps. The four plots show the initial conditions at t = 0 and the result after 20, 40 and 80 time steps. We choose B with equation image, where L = 5.

Figure 5.

Results for mixed TV L1L2-norm regularization for the same data as in Figure 4.

For the case of partial perfect observations we obtain similar results. Mixed TV L1L2-norm regularization (Figure 7) gives better initial conditions than standard 4DVar (Figure 6).

Figure 6.

Results for 4DVar for the same data as in Figure 4 but with perfect observations every 20 points in space and every 2 time steps for B with equation image, where L = 5.

Figure 7.

Results for mixed TV L1L2-norm regularization for the same data as in Figure 6.

Finally, Figures 8 and 9 show the results for partial noisy observations. Note that with this choice of B the results for 4DVar (Figure 8) are better than the results for the diagonal matrix B (Figure 1) because information is spread via the covariance matrix B, and we see that the oscillations in the analysis are significantly reduced. It is notable, however, that the mixed TV L1L2-norm regularization (Figure 3) eliminates oscillations in the analysis even when the matrix B provides no smoothing. Moreover, where correlations are taken into account via the matrix B, then mixed L1L2-norm regularization (Figure 9) gives still better results than 4DVar (Figure 8). The quantities of the errors in the initial conditions for this particular case are summarized in the fifth row of Table 1, where we see that the errors using mixed TV L1L2-norm regularization are the smallest.

Figure 8.

Results for 4DVar for the same data as in Figure 1, but for B with equation image, where L = 5.

Figure 9.

Results for mixed TV L1L2-norm regularization for the same data as in Figure 8, but for B with equation image, where L = 5.

5.3. Changing the length of the assimilation window

Again, we take the same experimental data as in subsection 5.1; this time, however, we reduce the size of the assimilation window from 40 time steps to 5 time steps and carry out the following test: we take imperfect observations every 5 points in space and every 2 time steps with Gaussian noise of mean zero and variance 0.01. For the background we again take the truth perturbed by Gaussian noise with covariance B taken from (29) with equation image.

Figures 10 and 11 show the results for a reduced size of the assimilation window. The first observation that we can make is that again the regularization using the mixed TV L1L2-norm (Figure 11) is consistently better than that using the L2-norm (Figure 10). Standard 4DVar produces oscillations, in particular in the initial conditions, whereas the mixed TV L1L2-norm regularization does not show any oscillations. The oscillations in the initial conditions in standard 4DVar then lead to errors in the forecast (see plots for t = 5, t = 20 and t = 45 in Figure 10). Again, for 4DVar, the forecast of the analysis does not keep the amplitude correctly (final plot in Figure 10), whereas the mixed TV L1L2-norm regularization provides a more accurate amplitude in the forecast (final plot in Figure 11).

Figure 10.

Results for 4DVar applied to the linear advection equation where the initial condition is a square wave. We take imperfect observations every 5 points in space and every 2 time steps over the assimilation interval, which is 5 time steps. The four plots show the initial conditions at t = 0 and the result after 5, 20 and 45 time steps. 4DVar leads to oscillations in the initial condition and a misplaced discontinuity in the forecast.

Figure 11.

Results for mixed TV L1L2-norm regularization for the same data as in Figure 10. Mixed TV L1L2-norm regularization gives the best possible result for the initial condition.

5.4. Summary of initial condition errors

In Table 1 we summarize the analysis errors (the errors between the analysis and the truth at t = 0, i.e. the initial condition errors) measured in the L2 vector norm for the different regularization techniques. The results are shown for all three test cases described in section 5.1, where either perfect observations are taken at all spatial and time points, partial perfect observations are taken less frequently in time and space, or partial imperfect (noisy) observations are taken, also with less frequency. We choose observation errors with covariance R = 0.01I and assimilation windows of length 40 and length 5. We consider the two background covariance matrices equation image, and the double-sided exponential covariance matrix B given by (29), with three different variances: equation image, equation image and equation image.

For the mixed TV L1L2-norm regularization method, we also give results for different values of δ in (19). The emphasis on the sparsity of the gradient of the initial condition depends on this regularization parameter. We have looked at three different values for δ and the best of all three results (i.e. the smallest error in the initial condition) is underlined in the table. The regularization depends on the regularization parameter, but investigating the influence of this parameter and finding the optimal choice of δ are beyond the scope of this paper. We remark that for the plots in the previous subsections we have used the value of δ from the table that gives the smallest initial condition error.

We see from the entries in the table that the errors in the analysis at time t = 0 are consistently smaller for mixed TV L1L2-norm regularization than for standard 4DVar or L1-norm regularization. Mixed TV L1L2-norm regularization gives an error of about one magnitude smaller than standard 4DVar. We also observe from the table that, for standard 4DVar, L1-norm regularization and mixed TV L1L2-norm regularization, the errors in the initial condition (analysis) decrease as the variance in the background error is reduced, i.e. as the ratio of the background to observation variance decreases. This is consistent with the results of Haben et al. (2010), which show that the standard 4DVar assimilation problem becomes more well-conditioned (well-posed) as this ratio decreases. These examples demonstrate that, even where the noise in the background and observations is Gaussian with known covariances, the standard 4DVar approach does not produce as accurate an analysis as mixed TV L1L2-norm regularization in the presence of sharp fronts and model error.

6. Further experiments

We now investigate how the 4DVar and mixed TV L1L2-norm regularization methods perform in cases where the position of the shock in the background is displaced from the truth and where the frontal gradient of an advected wave in the background is incorrect. As discussed in the Introduction, it is recognized that if a shock in the background field is displaced then the 4DVar method may not give a good analysis. Similarly, the assimilation method may be unable to capture a sharp shock if there is a weak gradient in the background front.

6.1. A shifted background

We consider the same problem as in subsection 5.1, with the same experimental data and error covariance matrices.

However, here we shift the square wave in the background by 0.02 to the right, so that the shock is displaced. The reason for this shift is a practical one: fronts are often resolved correctly in numerical weather forecasting, but the front is often predicted to be in the wrong position. We simulate this situation in our simplified model by assuming a slightly shifted background. We add noise to this background, taken from a normal distribution with (i) a background error covariance matrix B = 0.01I and (ii) a background error covariance matrix B taken from (29) with equation image, which is consistent with the error in the shifted background. We only consider the case with partial noisy observations, since this is the most interesting and realistic one.

The results for background error covariance matrix B = 0.01I are shown in Figures 12 (4DVar) and 13 (mixed TV L1L2-norm regularization). We note that in the case of 4DVar the recovered analysis (first plot in Figure 12) is very oscillatory and at the end of the window the solution contains undershoots (third plot in Figure 12), whereas in the analysis for the mixed TV L1L2-norm regularization (first plot in Figure 13) the oscillations are removed and the front sharpened, although not recovered exactly. Furthermore, mixed TV L1L2-norm regularization contains no undershoots at the end of the window and retains the amplitude of the front more accurately than 4DVar (see second and third plots in Figures 12 and 13). The errors in the analysis, measured in the L2-norm, for the standard 4DVar method and for the mixed TV L1L2-norm regularization (with δ = 100) are given respectively by 1.75 and 1.17, demonstrating the increased accuracy achieved by the new method.

Figure 12.

Results for 4DVar for a shifted (and noisy) background and for background error covariance matrix B = 0.01I.

Figure 13.

Results for mixed TV L1L2-norm regularization for the same data as in Figure 12.

The results for an exponential covariance matrix B with equation image are shown in the plots in Figures 14 and 15. The initial condition in 4DVar is clearly recovered poorly, with many oscillations (see first plot in 14). Furthermore, at the end of the assimilation window the solution gives undershoots (see third plot in Figure 14) and the amplitude of the front is reduced (see second and third plots in Figure 14). The solution at the initial time provided by the mixed TV L1L2-norm regularization has less oscillation present in the shock wave (see first plot in Figure 15) and produces somewhat less distortion of the wave front over the window (see second and third plots in Figure 15). The errors in the analysis in this case for the standard 4DVar and the mixed TV L1L2-norm regularization (with δ = 10) are similar, with a value of 1.80. Both methods smear the shock front and both produce an initial phase error which is reproduced in the forecast.

Figure 14.

Results for 4DVar for a shifted (and noisy) background and for background error covariance matrix B taken from (29) with equation image.

Figure 15.

Results for mixed TV L1L2-norm regularization for the same data as in Figure 14.

The plots in Figures 14 and 15 show that choosing an exponential (non-diagonal) covariance matrix B is not necessarily advantageous when there is a sharp front with a phase error. In this case both 4DVar and mixed TV L1L2-norm regularization with a diagonal covariance matrix B capture the shock front more accurately, but the mixed TV L1L2-norm technique also eliminates the oscillations in the analysis arising from the effects of the model error (see Figures 12 and 13).

In general, the mixed norm approach removes oscillations and sharpens fronts, but the position of the shock is not recovered precisely where there is a phase error in the background.

6.2. A slanted front for the background

Finally, with the same experimental data as in subsection 5.1 we consider a slanted background given by the slanted square wave

equation image(30)

This slanted background is plotted in Figure 16.

Figure 16.

Slanted background (without noise) for example in section 6.2.

We add noise to this background, taken from a normal distribution with covariance matrix B taken from (29) with equation image, which is consistent with the error between equation image and the true initial condition given by (28).

Figures 17 (4DVar) and 18 (mixed TV L1L2-norm regularization) show the results where the background is given by (30). 4DVar produces oscillations in the initial condition (first plot in Figure 17) and is not able to recover the correct initial condition from the (wrong) slanted background. The mixed TV L1L2-norm regularization approach, however, does not generate oscillations in the initial condition (first plot in Figure 18) and moreover produces a well-recovered front, given that the background was given by a (wrong) slanted front (compare the results in the first plot of Figure 18 with the background in Figure 16). The error in the analysis, measured in the L2-norm, for the standard 4DVar method is 1.10, while the error in the mixed TV L1L2-norm method with δ = 100 is only 0.86 – a clear improvement. Hence we conclude that the mixed TV L1L2-norm regularization removes oscillations and sharpens fronts and steep gradients, whereas standard 4DVar, even where smoothing is provided via the background covariance matrix, introduces oscillations where the background frontal gradient is incorrect.

Figure 17.

Results for 4DVar for a slanted (and noisy) background taken from (30).

Figure 18.

Results for mixed TV L1L2-norm regularization for a slanted (and noisy) background taken from (30).

7. Conclusions and future work

In this paper we have presented mixed TV L1L2-norm regularization, a new approach for variational data assimilation. We have given numerical examples containing shock fronts in order to demonstrate that mixed TV L1L2-norm regularization gives better results than either the standard 4DVar (L2-norm regularization) technique or a simple L1-norm regularization technique in the presence of model error. The errors in the analysis at time t = 0 are found to be consistently smaller for mixed TV L1L2-norm regularization than for standard 4DVar or L1-norm regularization for a range of ratios of observation to background variance and for both perfect and noisy observations with various temporal and spatial frequencies. These examples demonstrate that, even where the noise in the background and observations is Gaussian with known covariances, the standard 4DVar approach does not produce as accurate an analysis as mixed TV L1L2-norm regularization in the presence of sharp fronts and model error.

One of the strengths of the mixed TV L1L2-norm regularization is that in the case where the background covariance matrix B is poorly tuned it gives a much better performance than 4DVar in the presence of model error. This is relevant to operational NWP where the matrix B is difficult to determine and must, in any case, be simplified to make the assimilation problem computationally tractable.

Future work will be to apply this technique to higher-dimensional and possibly multi-scale problems. Because the minimization process for the mixed TV L1L2-norm regularization approach in (19) is more involved than that for the standard approach in (13), practical implementations will also have to be investigated together with the efficiency of this new approach.

Appendix

The solution to the the data assimilation problem can be interpreted in statistical terms, where certain assumptions about the errors hold (Nichols, 2010). For the standard 4DVar problem, Gaussian errors are assumed for both the background and the observations, so the minimization of the objective function (1) is equivalent to maximizing the a posteriori likelihood estimate of the state, given the observations and the prior. A similar derivation can be made for L1-norm regularization (16) (which we are going to do here for the single variate case).

The addition of the penalty term μ2|z|1 in (16) to the least squares term is sometimes also referred to as Lasso regression in statistics (Tibshirani, 1996). Now, |zi|, where zi is the ith entry of z, is proportional to the negative log-density of the Laplace (or double-sided exponential) distribution. Hence the L1-norm regularization can be derived as a Bayesian posterior estimate, where the priors are independently distributed variables with Laplace probability density function

equation image(31)

where γ = 22. The in-depth mathematical investigation of L1-norm regularization is the subject of future research and beyond the scope of this paper.

In order to solve equation (3) at each step we use a direct matrix decomposition method (Gaussian elimination).

We remark that the solution of the minimization problem using the least mixed norm solution described in section 4 (see also Fu et al., 2006) is more expensive than standard 4DVar as the problem size is increased. More efficient methods need to be found for the minimization; the details are beyond the scope of this paper.

We note that traditional 4DVar is not designed to deal with model error. Hence, for future work, a fairer comparison would be weak-constraint 4DVar (see, for example, Trémolet, 2006) with mixed TV L1L2-regularization.

Acknowledgements

The authors thank Nathan Smith (University of Bath) for helpful discussions on the subject of L1-norm regularization. The research of the first and third authors is supported by Great Western Research (GWR) Grant ‘Numerical weather prediction: multi-scale methods and data assimilation’ and by the Bath Institute for Complex Systems (BICS, EPSRC Critical Mass Grant). The research of the the second author is supported by the National Centre for Earth Observation (NCEO).

Ancillary