### Abstract

- Top of page
- Abstract
- 1. Introduction
- 2. Formulation of variational assimilation
- 3. Three-body model
- 4. Three-body model experiments
- 5. Discussion
- Acknowledgements
- References

The justification for the standard four-dimensional variational data assimilation (4D-Var) method used at several major operational centres assumes a perfect forecast model, which is clearly unrealistic. However, the method has been very successful in practice. We investigate the reasons for this using a toy model with fast and slow time-scales and with non-random model error. The model error is chosen so that the solution remains predictable on both time-scales. The fast modes are much less well observed than the slow modes. We show that poorly observed modes can be best forecast by using a regularization matrix in place of the background-error covariance matrix, and using it to give a much stronger constraint than that implied by the true background error for these modes. The effect is that use can be made of observations over a longer time period. This allows the resulting forecast-error growth to be reduced to much less than that of random perturbations generated using the analysis-error covariance matrix and even less than the model error growth given sufficiently accurate observations. © Crown Copyright 2010. Published by John Wiley & Sons, Ltd.

### 1. Introduction

- Top of page
- Abstract
- 1. Introduction
- 2. Formulation of variational assimilation
- 3. Three-body model
- 4. Three-body model experiments
- 5. Discussion
- Acknowledgements
- References

Four-dimensional variational data assimilation (4D-Var) has been used successfully at many major operational centres for several years. Examples are the Met Office (Rawlins *et al.*, 2007) and the European Centre for Medium-Range Weather Forecasts (ECMWF: Klinker *et al.*, 2000). In each case the operational introduction of 4D-Var has resulted in significant improvements in performance. However, the theoretical justification of 4D-Var, for instance given by Lorenc (1986), assumes a linear and perfect forecast model and Gaussian errors with zero mean in the background forecast and observations. Under these assumptions, 4D-Var gives a statistically optimal estimate of the state of the atmosphere. As a result, much research has been carried out since the operational introduction of 4D-Var to improve the formulation so that these assumptions can be relaxed. Examples are the nonlinear transform used to assimilate humidity into the ECMWF model by Hólm (2007) and the use of weak-constraint 4D-Var (Trémolet, 2006) to allow for model error.

It is difficult, however, to reconcile the theoretical limitations of 4D-Var with its practical success in situations far from those for which it is valid. This has motivated studies of which aspect of 4D-Var contributes most to its performance, for instance those by Lorenc and Rawlins (2005) and Laroche *et al.* (2007), which demonstrate that much of the improvement comes from incorporating a linearized version of the forecast model within the system. Diagnostic studies, for instance that by Cardinali *et al.* (2004), show that most of the information in an analysis comes from earlier observations via the background state. In the ECMWF 12 hour cycled system, Cardinali *et al.* showed that 85% of the information typically came from the background. The effect is that the analysis error is only slightly smaller than the background error.

Satisfactory performance of a cycled system requires that the growth of the analysis error during the assimilation window is on average compensated for by the reduction in error due to the observations. This is achieved in operational systems. Since the next background error is the analysis error evolved through the assimilation window, this can only be reconciled with Cardinali *et al.*'s results if the analysis error does not project strongly on to rapidly growing modes. The theory of 4D-Var shows that the analysis preferentially uses rapidly growing modes to fit the data; thus the analysis error in these modes is small. This is confirmed by toy-model experiments, e.g. Trevisan *at al.* (2010). The evidence from operational performance (Piccolo, 2010) is that this must be done efficiently, so that the subsequent error growth during the assimilation window is small. This is despite the fact that the background-error covariance matrix used in all current operational systems is essentially climatological, and thus contains considerable averaging. It is thus likely that this matrix underestimates the true errors in rapidly growing modes. This appears inconsistent with the observed efficiency of the analysis in correcting them.

In this article we illustrate how optimum forecast performance is obtained by forcing the analysis to use only slowly growing modes to fit the observations. This results in greater weight being given to observations from earlier assimilation cycles. Given sufficiently accurate observations, we show that the subsequent error growth in the forecast can also be reduced. We demonstrate this using the three-body model used for studies of 4D-Var by Watkinson (2006). This model supports rapidly growing perturbations, so is suitable for investigating the issue raised in the previous paragraph. The three bodies are referred to as sun, planet and moon. In this model there are two time-scales: a slow time-scale associated with the motion of the planet round the sun and a fast time-scale associated with the motion of the moon round the planet. We aim to make useful predictions of both modes. The case in which the fast time-scale is not accurately predicted by the model and has to be treated as ‘noise’ is discussed in a companion article (Cullen, 2010). We use two different versions of this model as the ‘truth’ model, which generates the trajectory from which the observations are drawn, and the ‘forecast’ model. This ensures that the forecast diverges from the truth unless the observations are successfully assimilated, as is the case in the real atmospheric system.

We can justify the use of a smaller background-error covariance matrix by thinking of 4D-Var as a method of regularizing the otherwise ill-posed problem of fitting a model state to the observations. This is described in Johnson *et al.* (2005b), where it is shown that 4D-Var corresponds to a Tikhonov regularization using the forecast background. We show that the 4D-Var algorithm can be re-interpreted as a regularization using a complete model trajectory, under the assumption that the model trajectory is accurately represented by the evolution of the Jacobian of the model starting from a given initial state. The studies of Lorenc and Rawlins and of Laroche *et al.* cited above suggest that the use of the trajectory is an essential part of the success of 4D-Var. The ‘optimal’ regularization would be the choice of background-error covariance matrix that minimized the short-range forecast error, which will not necessarily be the ‘true’ background-error covariance matrix.

There is a close link between this procedure and the use of a model-state control variable to represent model error in weak-constraint 4D-Var (Trémolet, 2006). If we consider an arbitrarily long window, so that the background becomes irrelevant, the weak-constraint method fits a time sequence of observations with a model trajectory to which small corrections are applied periodically. The regularization approach would seek the smallest corrections that would have to be made to a model trajectory to enable it to fit the observations to within the observational error over a long time period. The optimal-state estimation approach would make the corrections depend on the model error, so that large corrections could be made to modes where the model is inaccurate. We show that in situations where the model error growth is slower than the growth of perturbations under the action of the model, the regularization approach makes corrections to the trajectory that are smaller than the model error and is successful in improving the forecasts. However, unlike the long-window approach, the trajectory is computed sequentially rather than by solving a simultaneous minimization problem. It would be of interest to see whether further benefit could be obtained by applying the same regularization matrix within a simultaneous minimization problem.

All calculations have been carried out using Mathematica® 6.0 (Wolfram, 2007).

### 2. Formulation of variational assimilation

- Top of page
- Abstract
- 1. Introduction
- 2. Formulation of variational assimilation
- 3. Three-body model
- 4. Three-body model experiments
- 5. Discussion
- Acknowledgements
- References

We use the standard notation for describing variational data assimilation defined by Ide *et al.* (1997). The conventional formulation of 4D-Var can then be written as follows. Suppose time is discretized, with the index *j* denoting time steps. Define a state vector **x**_{j}: for each *j* this is an *l*-dimensional vector. We denote truth values of **x**_{j} as **x**_{t,j}. Assume truth values evolve forward through one time step under the nonlinear operator *N*, so that

- (1)

and that we have a nonlinear forecast model *M*_{j,j−1} that evolves **x**_{j−1} forward for one time step so that

- (2)

In weak-constraint 4D-Var, (Trémolet, 2006), we assume that *N***x** = *M***x** + **q**, where **q** represents the model error.

We assume we have *m* observations **y**_{i}: 1 ≤ *i* ≤ *m* and a nonlinear observation operator *H*_{i} generating the *i*th observation from a four-dimensional state {**x**_{t,j}: 0 ≤ *j* ≤ *n*} valid at the same time. *n* is the number of time steps. Assume the observations have uncorrelated zero-mean Gaussian observation errors **v**_{i} with covariance **R**. Then we write

- (3)

Denote the Mahalanobis norm for column vector **x** and matrix **A** by

Then strong-constraint 4D-Var implies that

- (4)

where **x**_{b} is the background state valid at *t* = 0. This requires determination of a single state vector **x**_{0} and a means of calculating the four-dimensional state **x**_{(·)} from it. The perfect model assumption is made so that (2) is used for this purpose. The statistical assumption behind the formulation of the observation term in *J*(**x**_{0}) is stated before (3). The assumption behind the background term is that we can write

- (5)

where **x**′ is a Gaussian random variable with zero mean and covariance **P**. **P** can evolve in time, as in Kalman filter theory. We also assume that **x**′ is uncorrelated with **v**_{i} for all *i*.

Following Lorenc (1986), Cullen (2010) shows that in the strong-constraint formulation of 4D-Var, *J*_{b} can be rewritten as

- (6)

where **M**_{pj,0} is the Jacobian matrix of *M* evaluated at **x**_{b}, the cost function is evaluated every *p* time steps and the background-error covariance grows according to the equation

- (7)

Equations (6) and (7) are only valid for Gaussian background errors with zero mean and a perfect linear forecast model.

Essentially the same argument applies in incremental 4D-Var, where the forecast model is nonlinear but perturbations are assumed to obey linear equations. The effect of (7) is then that the error growth is most rapid for the most rapidly growing singular vector, as illustrated for an idealized problem by Johnson *et al.* (2005a). If nonlinear 4D-Var is used, then (7) will be in error because the assumption of Gaussianity is not valid under nonlinear evolution. Since the initial error growth is linear, the Gaussian assumption will be valid for a while and the fastest growing structures may not be too badly estimated. The total error growth over the assimilation window will be incorrect if the linearity of the perturbation growth breaks down. If the model is imperfect, but the model errors are not too large, the estimated error growth may still be accurate enough to be useful.

We next rewrite *J*_{o} in (4) as a function of **x**_{0} as in Lorenc and Payne (2007). First assume **x**_{0} − **x**_{b} is small, as in the justification of incremental 4D-Var. We can then write

- (8)

where *M* propagates the initial state forward to all times within the window using (2). Then write

- (9)

where **y**′ = {**y**_{i} − *H*_{i}(*M*_{(·,0)}(**x**_{b}))} and **H** is the Jacobian matrix of *H*, which includes the selection of observation times as well as positions. We only consider the evolution of errors in the space spanned by **M**, so that **M** is invertible. Suppose **x**_{0} = **x**_{a} minimizes (4). Since the gradient of *J* is zero at the minimum, we can show that

- (10)

where the gain matrix **K** is defined by

- (11)

If all the observations are at the end of the 4D-Var window, we can consider 4D-Var to be equivalent to 3D-Var at the end of the window with a background-error covariance matrix given by (7). The analysis error at the end of the window is **MAM**^{T}, which will also be the background error for the next cycle.

In the present article, we consider the problem of cycled 4D-Var where the same background-error covariance matrix is used for all the analysis cycles. This reflects current operational practice in many centres. Usually this **B** matrix is regarded as an estimate of the ‘true’ background error **P**, which will evolve through the cycles. Use of **B** instead of **P** is justified under a stationarity assumption. In the present case we use a regularization matrix in the sense of Johnson *et al.* (2005b). Since this may be very different from an estimate of **P**, we write it as **C**. If too small a value of **C** is used, it is possible that the analysis may be unable to stay close to the observations (equivalent to filter divergence in a true Kalman filter). If too large a value is used, the analysis will largely depend on the current set of observations. If these are incomplete, the analysis error may become very large. These effects are demonstrated in Fisher (2007).

The use of a regularization matrix **C** instead of **P** means that (11) becomes

- (14)

A direct calculation of < (**x**_{a} − **x**_{t,0})(**x**_{a} − **x**_{t,0})^{T} >, where < · > denotes an expectation, then gives

- (15)

instead of (13).

Satisfactory performance of a cycled system requires that the increase in **P** due to the time evolution has then to balance the reduction in **P** by the observations. In the case of a perfect linear model where the true **P** is used in the analysis, this balance is expressed by

- (16)

If we allow for model error and assume that the model error accumulated over an assimilation window has a Gaussian distribution with zero mean and covariance **Q**, and is additionally uncorrelated with the analysis error and the observation errors, then (16) becomes

- (17)

If a regularization matrix **C** is used instead of **P** in the analysis, the equivalent of (16) is obtained by replacing **P** by **MAM**^{T} in (15) and the equivalent of (17) is obtained by replacing **P** by **MAM**^{T} + **Q** in (15).

Considering again the diagonal case, we can see from (15) that if, for some *k*, *C*_{k} is underestimated then *K*_{k} will be reduced and so *A*_{k}*P*_{k}. Therefore the growth in *P*_{k} under the action of the model will not be compensated for and *P*_{k} will grow. Now suppose that *C*_{k} is overestimated. In this case we can show that (14) implies that **K** (**HM**)^{−1} if **HM** is invertible, in which case (15) implies that **A** is bounded by

- (18)

In 3D-Var, when **M** = **I**, this will only be finite if the observations are complete. In 4D-Var it can be finite for incomplete observations if the action of the model through **M** introduces sufficient multivariate coupling. This is an important advantage of 4D-Var.

It is not practical to construct a formal optimization problem to find **C** to minimize the forecast error, since the result would also depend on the observations. Two practicable strategies are discussed.

The first is to ensure that the error reduction in rapidly growing modes given by (13) is large enough to compensate for the growth. This requires a sufficiently large **C**, but also sufficient observations to ensure that (18) controls the analysis error. Experiments with the three-body model using 3D-Var, to which (13) also applies with **M** replaced by **I** in (18), showed that satisfactory performance of a cycled system could not be obtained with incomplete observations. 4D-Var performs better because of the multivariate coupling introduced by the use of **M** in (18).

The second strategy is to fit the model trajectory to the observations over a longer time period. This is the justification for long-window 4D-Var (Trémolet, 2006). In order to exploit information from observations over a long time period, it is necessary to use the smallest possible **C**, which acts as a proxy for the model error corrections used in long-window schemes. In order to prevent systematic error growth over the cycles, it is at least necessary that the analysis increment evolved over the window is as large as the model error growth during the window. In the diagonal case, this requires

- (19)

If the perturbation grows under the action of the model, this implies *C*_{k}*< Q*_{k}. Note that this is different from the optimal-state estimation approach, which would set **C** = **Q**. Fisher *et al.* (2005) shows that a large **Q** acts as a ‘forgetting factor’, so that information from past observations is not used to analyse modes where the model error is large. Equation (19) shows that perturbation growth in the model can be exploited to allow smaller increments to be made. This allows more information from earlier observations to be retained.

Suppose that we choose a **C** matrix and use it to complete a set of analysis cycles that fits the observations to within observational error. This means that a model trajectory to which a correction has been added at the end of each window fits the observations to within the observational error over a long period. The statistics of the analysis increments will then define an upper bound on the size of the increments necessary to maintain the fit to the observations. These statistics can be used to define a new **C** matrix and the cycles repeated. There is clearly no guarantee that this method will converge, or that it will find the smallest **C** that is sufficient to do the job independently of the first guess.

### 5. Discussion

- Top of page
- Abstract
- 1. Introduction
- 2. Formulation of variational assimilation
- 3. Three-body model
- 4. Three-body model experiments
- 5. Discussion
- Acknowledgements
- References

We have tested cycled 4D-Var with an imperfect model that supports two time-scales, both of which are regarded as predictable. The assimilation window was chosen to be comparable to the fast time-scale, and the same mix of observations was used in each window. The effect is that the growth of errors on the fast time-scale can be large between observation times, while that of errors on the slow time-scale is small. It is shown that the large growth of analysis increments on the fast time-scale during the window can be exploited to construct a model trajectory that is almost an interpolator between the observations at different times, only a very small increment is added at each analysis time. For large model errors, this increment is such that the evolved increment during an assimilation window is comparable to the model error accumulated over a window. The effect is that the forecast errors can be reduced to values smaller than the model error growth over the same period, provided the observations are accurate enough and used over a sufficiently long time interval. This is confirmed by the sensitivity of the results to the observation error. Though the error is controlled by the accuracy of the observations, the size of the error is much less than that implied by the normal observability calculation for a single window. Thus the accuracy of the forecasts shows that effective use is being made of observations spread over multiple windows. If more observations are used in each window, the effect is degraded, because less use is made of the observations in previous windows and the time interpolation effect, which allows the model error growth to be compensated for, is reduced.

The forecast errors in the slow mode, which is well observed, are controlled by the model error. The time interpolation of observations does not take place over a long enough period to compensate for the model error. However, if the observation coverage is degraded, optimization of the **C** matrix becomes effective in reducing the forecast error and the forecast errors can be reduced below those implied by the model error growth.

In situations where the forecast-error growth is less than the model error growth, the analysis error must be compensating for future model error growth, so the analysis itself is suboptimal. This explains why the standard theory set out in section 2 does not describe what is happening. The behaviour is more like that of a long-window 4D-Var (Trémolet 2006), where small model-state corrections are added periodically to allow a model trajectory to be fitted to a set of observations over a long time period. The present method is a degenerate example of this procedure, where the trajectory is built up in short sequential steps rather than by a simultaneous minimization over the whole window. However, the recalculation of **C** does introduce a dependence on the whole assimilation period of 300 cycles. It would be of interest in future work to see whether further benefit is obtained from a simultaneous minimization.

There is an important difference from the long-window approach of Fisher *et al.* (2005) and Trémolet (2006) in that the corrections that give the optimal forecasts are much less than would be implied by the size of the model error. This appears to be mainly because (19) permits the analysis increments to be much less than the model error accumulated over a window for rapidly growing modes of the model. However, the model error used in this study does not conform to the statistical assumptions used to define the model error in Kalman filtering, so there is no reason why that theory should predict the optimal size of the increments.

The experiments shown in this article share the characteristic behaviour of operational systems in that the analysis error is not much less than the background error. The main difference in behaviour is that 3D-Var works well in the operational context but is unable to control error growth in the three-body system. In the three-body system, the typical perturbation growth for the fast variables dominates the model error growth. This results in the best forecasts being obtained with a **C** that is much less than that implied by model error growth. Assuming that the viability of 3D-Var means that the finite-amplitude error growth in real systems is not large over an assimilation window, which is consistent with the diagnostics in Piccolo (2010), it is likely that the increments in real systems will have to be comparable to the model error.

The issue for practical application is how to choose **C**. In well-observed situations, such as the slow mode in most of the experiments, the optimum performance is given by setting **C** to a climatological estimate of the true background error, so the behaviour is consistent with standard theory. However, the optimum performance for the fast mode is obtained by using much smaller values of **C**. The use of analysis increments to determine **C** is a practical way of achieving the required behaviour, since the analysis increments found by 4D-Var will tend to be small for rapidly growing modes.

In the three-body system, the best forecasts are not consistently obtained from the **C** that gives the best analyses. This can be seen in Figures 10 and 16. This illustrates that the **C** most appropriate for accurate forecasts may not be the same as that where the analysis itself is the most important product, such as in reanalyses. This issue has been widely recognized.