### Abstract

- Top of page
- Abstract
- 1. Introduction
- 2. Assimilation schemes
- 3. Analysis error for a simple example
- 4. Numerical results
- 5. Concluding remarks
- Acknowledgments
- References

The analysis-error variance of a 3D-FGAT assimilation is examined analytically using a simple scalar equation. It is shown that the analysis-error variance may be greater than the error variances of the inputs. The results are illustrated numerically with a scalar example and a shallow-water model. Copyright © 2010 Royal Meteorological Society

### 1. Introduction

- Top of page
- Abstract
- 1. Introduction
- 2. Assimilation schemes
- 3. Analysis error for a simple example
- 4. Numerical results
- 5. Concluding remarks
- Acknowledgments
- References

Data assimilation is widely used in weather and ocean forecasting to provide initial conditions for numerical forecast models. By combining observational data with an *a**priori*, or background, estimate of the model state, it is possible to obtain an improved estimate of the current state of the system, known as the analysis. Many data assimilation techniques are based on Bayes' rule, which in the case of Gaussian errors is equivalent to a least-squares fitting. For such techniques, provided that the errors in the observations and background state are correctly represented, the analysis obtained will be at least as accurate as the most accurate piece of input information, in a statistical sense. Thus the addition of more information into the assimilation procedure cannot degrade the analysis.

In practice, many approximations must be made in designing data assimilation schemes for practical use. One such approximation is known as 3D-FGAT (first guess at appropriate time), which can be considered as a half-way step between incremental three-dimensional variational data assimilation (3D-Var) and incremental four-dimensional variational data assimilation (4D-Var). The aim of 3D-FGAT is to provide some of the time information of a sequence of observations, without the need to code a full tangent linear and adjoint model, as needed in incremental 4D-Var. Such a scheme was used to generate the ERA-40 re-analysis (Uppala *et al*, 2005) and has been used in several atmospheric, oceanic and chemical assimilation systems (e.g. Vialard *et al*, 2003; Lee *et al*, 2004; Barret *et al*, 2008). In this note, we examine the effect of the 3D-FGAT approximation on the analysis errors for a simple problem. By considering the scheme as an approximation to incremental 4D-Var, we show that the effect of the approximation is to increase the variance of the analysis error. In particular, we show that, whereas for a perfect incremental 4D-Var system the analysis error variance can be no larger than the smallest of the variances on the inputs, the analysis error variance in 3D-FGAT can exceed the error variance of the inputs.

### 2. Assimilation schemes

- Top of page
- Abstract
- 1. Introduction
- 2. Assimilation schemes
- 3. Analysis error for a simple example
- 4. Numerical results
- 5. Concluding remarks
- Acknowledgments
- References

Incremental 4D-Var requires the minimization of a sequence of linearized cost functions of the form

- (1)

where **B** is the background-error covariance matrix, **R**_{i}, **H**_{i} and **d**_{i} are the observation-error covariance matrices, linearized observation operators and innovation vectors at times *t*_{i}, and satisfies the tangent linear model equation. The method is equivalent to solving the full nonlinear 4D-Var problem using a Gauss–Newton method (Lawless *et al.*, 2005).

The algorithm for 3D-FGAT is very similar to incremental 4D-Var, except that the tangent linear model is approximated by the identity, so that in the linearized cost function (1) we have for all times *t*_{i}. This introduces a discrepancy between the nonlinear model used to calculate the innovations in the outer loop and the linear model used to evolve the perturbations in the inner loop. We note that, in operational implementations of 3D-FGAT, the increment is usually defined to be at the centre of a time window of observations rather than at the beginning. In this case the linearized cost function takes the form

- (2)

In the next section we examine the analysis error of 3D-FGAT for a simple scalar example.

### 3. Analysis error for a simple example

- Top of page
- Abstract
- 1. Introduction
- 2. Assimilation schemes
- 3. Analysis error for a simple example
- 4. Numerical results
- 5. Concluding remarks
- Acknowledgments
- References

We consider the analysis error after one outer loop of incremental 4D-Var and of 3D-FGAT. In order to illustrate the effect of the FGAT approximation, we consider the simple example of a linear model for a scalar variable *x*. We define the full discrete system model to be

- (3)

where α is a scalar constant and *x*_{i}≡*x*(*t*_{i}). We consider an example in which we have a time window [*t*_{0}, *t*_{2}] with observations *y*_{0}, *y*_{2} at times *t*_{0} and *t*_{2} respectively. We assume that the errors on each observation have error variance and that the errors are uncorrelated. For this system, since the full model (3) is already linear, the tangent linear model has the same form as the full model and is given by

- (4)

where δ*x*_{i} is a small perturbation to the state *x*_{i}. We note that by using a linear system we expect the incremental 4D-Var algorithm to give the same solution as the minimization of the full 4D-Var cost function, since no approximation is being made in the linearization step. Although this is a big simplification in order to make the mathematical analysis tractable, it does not take away from the validity of the approach, but allows us to analyse the effects of 3D-FGAT in the simplest system possible.

We must be very precise about the properties of the background field at the different points in the time window. We will assume that the background field for the 3D-FGAT scheme is *x*_{b}(*t*_{1}), defined at the centre of the window, time *t*_{1}, with error variance . In order to derive a 4D-Var scheme for this system we need to have a background field at the start of the time window, time *t*_{0}. Since the model is linear, then the background field at time *t*_{0} is simply given by

- (5)

and this will have an error variance of . The innovation vectors for this problem are

- (6)

The inner loop cost function for incremental 4D-Var is then

- (7)

We minimize this to obtain δ*x*_{0} and add this to the background field *x*_{b}(*t*_{0}) at time *t*_{0} to obtain the incremental 4D-Var analysis

- (8)

which has analysis error

- (9)

As expected, we find that the analysis-error variance is both less than the observation-error variance and less than the error variance of the background field used, which in this case is given by at time *t*_{0}.

If the analysis is evolved to the centre of the time window using the model (3), then the analysis error of the incremental 4D-Var scheme at that time is

- (10)

which is less than , the background-error variance at time *t*_{1}. This result is a simple extension in time of a standard result for minimum variance estimation (e.g. Daley, 1991, section 4.1).

For the 3D-FGAT scheme applied to this problem, the increment is considered to be valid at the centre of the time window *t*_{1} and the inner loop cost function (2) is given by

- (11)

which has a minimum at

- (12)

For this scheme the analysis is found by adding the increment to the background field in the centre of the time window. Thus we have

- (13)

A calculation of the analysis-error variance gives

- (14)

where

- (15)

We note that where α = 1, so that there is no approximation in 3D-FGAT, we have β = 2 and the analysis-error variance is the same as that for incremental 4D-Var. In this case the analysis-error variance is bounded above by the smallest of the variances of the observational and background errors. For , the second term of (14) introduces an error dependent on the factor 2 − β. Thus this term measures how close the identity approximation used in 3D-FGAT is to the true tangent linear model. The further the tangent linear model is from the identity, the larger this term will become. It is particularly important to note that, for values of α far from unity, this term may be arbitrarily large and so the analysis-error variance at the initial time may exceed the error variance of the inputs.

In this example we have assumed that the variance information of the background field is correct at the centre of the time window. However, by removing the model evolution of the perturbation, the evolution of the variance information is neglected, so that the innovations are weighted incorrectly. For the case where α> 1 so that the variance grows throughout the assimilation window, then the innovation at time *t*_{0} is over-weighted with respect to the background and the innovation at time *t*_{2} is under-weighted (with the opposite occurring for α< 1). It is this incorrect use of the statistical information which leads to a sub-optimal analysis.

If we consider a 3D-Var scheme applied to this system, so that the observations are assumed to be valid at the centre of the time window, then the analysis is found to be no longer unbiased. This arises from the fact that the innovations are calculated as

- (16)

This introduces terms in the expected analysis error dependent on the change in the true state between observation times. Terms involving the true state then also occur in the expression for the variance. Hence we see that 3D-FGAT theoretically removes a major source of error in 3D-Var, even if the analysis error variance may be large.

### 5. Concluding remarks

- Top of page
- Abstract
- 1. Introduction
- 2. Assimilation schemes
- 3. Analysis error for a simple example
- 4. Numerical results
- 5. Concluding remarks
- Acknowledgments
- References

Assimilation schemes based on 3D-FGAT are widely used as an extension to 3D-Var in cases where the implementation of a 4D-Var system is prohibitively expensive. In this note we have demonstrated a property of 3D-FGAT which has hitherto been unremarked on in the literature, namely that the variance of the analysis error may be greater than the variance of any or all of the inputs. This is an inherent property which arises from the approximation of the tangent linear model by the identity within the assimilation scheme, which is not accounted for statistically. It can be interpreted as a kind of representativeness error within the linear problem, where the error is in how well the approximate linearization of the nonlinear observation operator represents the exact linearization, rather than how well the observation operator represents the true mapping between the state and observation space. The neglect of this component of the error within the assimilation system may lead to a sub-optimal analysis, which can cause an increase in the analysis error even when all other prior statistical information is correctly specified. It is important not to confuse this assumption with the tangent linear assumption used in 4D-Var. Even in a linear situation, the assumption of 3D-FGAT may not hold.

Although these results have been demonstrated for simple examples, there is no reason to think that this problem will disappear as the model becomes more complicated. Rather, the problem may arise whenever the true tangent linear model matrix is far from the identity. However, it must be recognized that 3D-FGAT is still likely to be an improvement over 3D-Var, which assumes that the observations in a time window are all valid at the same time. In fact, operational practice has shown great benefits from moving to a 3D-FGAT scheme. Hence, these results are not designed to discourage the use of 3D-FGAT. Rather they illustrate the importance of understanding the assumptions in this assimilation approach and their possible effects on the analysis error. One indication arising from this work is the importance of testing the validity of the 3D-FGAT approximation, in much the same way as a 4D-Var system is tested. Such a test was implemented by Weaver *et al*(2003) in the design of a 3D-FGAT assimilation system for an ocean circulation model. More routine tests of this type in a 3D-FGAT system would indicate where the error in the approximation is high and so provide an indication of possible uncertainties in the 3D-FGAT analysis.

Finally we comment that the approximation of the tangent linear model by the identity in 3D-FGAT can be considered an extreme example of using an approximate linear model in incremental 4D-Var. In most operational 4D-Var systems the linear model is not an exact linearization of the nonlinear model used in the outer loop, but often contains simplifications, such as different physical parametrizations or a different spatial resolution. Although we wished to concentrate on the 3D-FGAT approach in this note, a simplified incremental 4D-Var scheme could be analysed in the same way. It is clear from the analysis presented here that, if the error betweeen the approximate and true linear model is not accounted for statistically, then this may lead to the increase in analysis-error variance illustrated in this note.