The relative advantages and disadvantages of four-dimensional variational data assimilation (4D-Var), already operational in several numerical forecasting centres, and Ensemble Kalman Filter (EnKF), a newer approach that does not require the adjoint of the model, are the focus of considerable current research (e.g. Lorenc, 2003; Kalnay et al, 2007a,b; Gustafsson, 2007; WWRP/THORPEX Workshop, 2008).
One area where 4D-Var may have an advantage over EnKF is in the initial spin-up, since there is evidence that 4D-Var converges faster than EnKF to its asymptotic level of accuracy. For example, Caya et al(2005) compared 4D-Var and EnKF for a storm simulating the development in a sounding corresponding to 0000 UTC on 25 May 1999. They found that
Overall, both assimilation schemes perform well and are able to recover the supercell with comparable accuracy, given radial-velocity and reflectivity observations where rain was present. 4D-Var produces generally better analyses than EnKF given observations limited to a period of 10 min (or three volume scans), particularly for the wind components. In contrast, EnKF typically produces better analyses than 4D-Var after several assimilation cycles, especially for model variables not functionally related to the observations.
In other words, for the severe storm problem, EnKF eventually yields better results than 4D-Var, presumably because of the assumptions made in the 4D-Var background-error covariance, but during the crucial initial time of storm development, when radar data start to become available, EnKF provides a worse analysis. For a global shallow-water model, which is only mildly chaotic, Zupanski et al(2006) found that initial perturbations that had horizontally correlated errors converged faster and to a lower level of error than perturbations created with white noise. In agreement with these results, Liu (2007) found using the SPEEDY global primitive-equations model that perturbations obtained from differences between randomly chosen states (which are naturally balanced and have horizontal correlations of the order of the Rossby radius of deformation) spun up faster than white noise perturbations.
Yang et al(2009) compared 4D-Var and the Local Ensemble Transform Kalman Filter (LETKF; Hunt et al, 2007) within a quasi-geostrophic (QG) channel model (Rotunno and Bao, 1996). They found that, if the LETKF is initialized from randomly chosen fields with uniform distribution perturbations, it takes more than 100 days before it converges to the optimal level of error. If, on the other hand, the ensemble mean is initialized from an existing 3D-Var analysis, which is already close to the true state, using the same random perturbations, the LETKF converges to its optimal level very quickly, within about 3 to 5 days. However, with a well-tuned background-error covariance, 3D-Var and 4D-Var converge fast without needing a good initial guess. This has also been observed for severe storm simulations (Caya et al, 2005), especially when using real radar observations (Jidong Gao, 2008, personal communication). It is not surprising that EnKF spins up more slowly than 3D-Var or 4D-Var because, in order to be optimal, the ensemble has to satisfy two independent requirements, namely that the mean be close to the true state of the system, and that the ensemble perturbations represent the characteristics of the ‘errors of the day’ in order to estimate the evolving background-error covariance B. In both 3D-Var and 4D-Var, by contrast, B is tuned a priori and assumed to be constant.
Within a global operational system it is feasible to initialize the EnKF from a state close enough to the optimal analysis, such as an existent 3D-Var analysis, with balanced perturbations drawn from a 3D-Var error covariance, so that spin-up may not be a serious problem. However, there are other situations, such as the storm development discussed above, where radar information is not available before the storm starts, so that no information is available to guide the EnKF in the spin-up towards the optimal analysis. The system may start from an unperturbed state without precipitation, and if a severe storm develops within a few minutes and the EnKF takes considerable real time to spin up from the observations, it will not give reliable forecasts until later in the storm evolution, and thus give results that are less useful for severe storm forecasting than 4D-Var or even 3D-Var. Similarly, a regional model initialized from a global analysis at lower resolution may take too long to spin up when confronted with mesoscale observations.
In this note we propose a new method to accelerate the spin-up of the EnKF by ‘running in place’ (RIP) during the spin-up phase and using the observations more than once in order to extract maximum initial information. We find that it is possible to accelerate the convergence of the EnKF so that (in real time) it spins up even faster than 3D-Var or 4D-Var without losing accuracy after spin-up and without requiring prior information such as a ‘well tuned’ initial background-error covariance. After spinning up, RIP is automatically turned off and the system returns to the standard EnKF formulation. Section 2 contains a brief theoretical motivation and discussion of the method, results are presented in section 3 and a discussion is given in section 4.
2. Spin-up, no-cost smoothing and ‘running in place’ in EnKF
In this section we briefly review the conditions that justify the rule that in Kalman Filter data should be used once and then discarded. We then suggest that this rule is not strictly valid during spin-up, when the initial covariance is still influencing the results, or when the statistics of the ‘errors of the day’ suddenly change due to strong nonlinearity, as during the initial development of a severe storm. During these transition periods, the ensemble perturbations are not representative of the ‘errors of the day’ and extracting information from observations using them only once is not efficient.
Hunt et al(2007) provide a derivation of the linear Kalman Filter equations by showing that in the cost function
the background term represents the Gaussian distribution of a state with the maximum likelihood trajectory (history), i.e. the analysis/forecast trajectory that best fits the data from t = t1, …, tn−1. In (1), is the background state, is the vector of observations at t = tn and P and Rn are the corresponding error covariances. The background state is obtained by using the forecast model M to advance the previous maximum likelihood analysis at t = tn−1 to the new analysis time tn. Taking logarithms of the Gaussian distribution, this means that for some constant c,
After the cost function in (1) is minimized, finding the analysis and its corresponding covariance , a similar relationship holds for the analysis at tn, and another constant c′:
Equating the terms in (3) that are linear and quadratic in x, the linear Kalman Filter equations for a perfect model are obtained.
This derivation makes clear that Kalman Filter yields the maximum likelihood estimate with the corresponding error covariance at time tn if the model is linear and perfect and if the previous analysis at tn−1 is also the maximum likelihood state estimate at the previous analysis time. Hunt et al(2007) also address the problem of initialization: a system can be initialized assuming a prior background distribution at the initial time t0 such that the initial background error covariance is large but not infinitely large. This introduces into the cost function an additional quadratic term, but Hunt et al(2007) point out that ‘with sufficient observations over time, the effect of this term [on the background-error covariance] at time tn decreases in significance as n increases’. In other words, with sufficient observations, the Kalman Filter spins up and eventually converges and yields the maximum likelihood solution and its error covariance independently from the initial conditions. During spin-up, however, or when the statistical properties of the dynamical system suddenly change due to strong nonlinearity, the background may be a very unlikely state, and it may be desirable to use the observations more than once in order to extract maximum information about the true dynamics from them.
EnKF, like the Kalman Filter, also provides a maximum likelihood analysis, except that the background- and analysis-error covariances are estimated from an ensemble of K generally nonlinear forecasts:
where is a perturbation matrix whose kth column is the background (forecast) perturbation and is the most likely forecast state, i.e. the ensemble average. Similar equations to Kalman Filter are valid for the analysis mean and the analysis error covariance . Thus, EnKF, like the original Kalman Filter, is a sequential data assimilation (DA) system where, after the new data are used at the analysis time, they should be discarded (Ide et al, 1997), but this is true only if the previous analysis and the new background are not only the most likely states given the past observations, but they also have already ‘forgotten’ the choice of initial ensemble and carry the proper perturbation structures corresponding to the true dynamic instabilities. In other words, if the system has converged after the initial spin-up, all the information from past observations is already included in the background and the data can be discarded after the new analysis is computed. In contrast, 4D-Var is a smoother that best fits all the observations (even asynoptic data) within an assimilation window. We note that EnKF can be extended to four dimensions as in 4D-Var, allowing for the assimilation at the right time of asynoptic observations made between two analyses (e.g. Hunt et al, 2004, 2007), but, being a filter, the EnKF analysis is only obtained at the end of the assimilation window.
In summary, after the initial spin-up, all the information from past observations is already included in the background field, so that the observations should be used only once and then discarded. However, there is no theoretical reason why this constraint should also be applied when EnKF is ‘cold-started’, and the initial ensemble is not representative of the most likely state and its uncertainty, since during spin-up the background term still ‘remembers’ the arbitrarily chosen initial ensemble. In practical applications, the rule of using the data only once is usually applied even during spin-up (e.g. Zupanski et al, 2006), and depending on the initial ensemble, a slow EnKF spin-up can then be observed.
In this note we suggest that when a quick EnKF spin-up (in real time) is needed in order to make useful short-range forecasts for fast weather instabilities, the initial observations can be used more than once in order to extract more initial information from them, and that this procedure can lead to a much faster spin-up of the initial ensemble. This RIP algorithm is made possible by the use of a ‘no-cost’ Ensemble Kalman Smoother (EnKS; Kalnay et al, 2007b; Yang et al, 2009).
In EnKF analysis, the ensemble analysis is a linear combination (weighted average) of the ensemble forecasts at the end of the assimilation window (schematic Figure 1). In the ETKF (Bishop et al, 2001) and the LETKF (Hunt et al, 2007), these weights are computed separately for the analysis ensemble mean and perturbations, as indicated in (5a) and (5b). The weight vector in (5a) is associated with the information from observations and the weight matrix in (5b) is related to flow-dependent error statistic (‘errors of the day’). The no-cost EnKS uses these weights, and it is referred to as ‘no-cost’ because these weights are already computed in each LETKF analysis†
Within an assimilation window [tn−1, tn], we assume that perturbations evolve linearly. The smoothed analysis at tn−1 is obtained by applying the weights derived at tn on the analysis perturbations at tn−1, as indicated by (6a) and (6b).
Since a linear combination of ensemble trajectories within an assimilation window is also a model trajectory, a linear combination that is close to the truth at one time within the window should remain close to the truth over the entire window (at least as close as model errors allow). As illustrated in Figure 1, the dashed line indicates the model trajectory constructed by the weights derived at tn: this trajectory ends at the analysis mean as derived from (5a) with the initial state derived from (6a). Since the weights contain the observation information within the assimilation window, the smoothed analysis (the cross in Figure 1) at tn−1, is expected to be more accurate than the analysis mean (the end of the first dashed line in Figure 1) because it knows the ‘future’ observations. This argument (B. Hunt, 2009, personal communication) indicates that the weights used in constructing the analysis ensemble mean, although determined at the end of the assimilation window, should be valid throughout the window. A similar argument suggests that the ensemble analysis perturbation weights obtained using Bayes' theorem are also valid throughout the assimilation window [tn−1, tn] (Hunt et al, 2007). As other properties of EnKF, this one may be affected by localization.
The no-cost EnKS is easy to implement if the weights that transform the ensemble forecasts into the ensemble analysis are explicitly computed and available, as is the case in the LETKF. The analysis ensemble members at time tn are each a weighted average (linear combination) of the ensemble forecasts valid at tn (Hunt et al, 2007). Since the ensemble analysis estimates the linear combination of the trajectories that best fits the observations within an assimilation window, not just at the end of the interval, the no-cost EnKS valid at the beginning of the window is obtained by simply applying the same weights obtained at analysis time tn to the initial ensemble at tn−1.
The no-cost EnKS was tested by Yang et al(2009) on the QG model of Rotunno and Bao (1996). Figure 2 compares the analysis error of the LETKF with that obtained using the no-cost EnKS, and shows that, indeed, the no-cost ensemble Kalman smoother at tn−1 is more accurate than the analysis ensemble valid at tn−1, as could be expected from the fact that the smoothed ensemble at the beginning of the window has benefited from the information provided by the ‘future’ observations within the window [tn−1, tn]. Although the no-cost smoothing improves the accuracy of the initial analysis at tn−1, it does not improve the final analysis at tn, since the forecasts started from the new initial analysis ensemble will end as the final analysis ensemble (at least in a linear sense; Figure 1 and also the Appendix of Yang et al, 2009).
With the no-cost EnKS it is thus possible to go backwards in time within an assimilation window, and then advance with the regular EnKF using the initial observations repeatedly in order to extract maximum information from them. During spin-up, when the prior (background field and background-error covariance) are not representative of the true state and the ‘errors of the day’, this procedure improves the quality (likelihood) of the initial ensemble mean faster, and leads the ensemble-based background-error covariance to be more representative of the true forecast-error statistics.
As indicated above, EnKF requires the choice of an initial prior ensemble at t0 with covariance . We have tested the RIP algorithm using three different initial ensembles, all with the same randomly chosen ensemble mean but with different distributions of the initial random perturbations: (1) a uniform distribution; (2) a Gaussian distribution and (3) perturbations drawn from a carefully optimized 3D-Var error covariance. Cases (1) and (2) include no prior information and each state variable is independently perturbed; case (3) contains the same prior information used for the 3D-Var experiments. The 4D-Var experiments make use of the same 3D-Var covariance with an optimal rescaling. In case (1), the random perturbations are uniformly distributed between −0.05 and 0.05 and in case (2), the Gaussian perturbations have a zero mean with a standard deviation of 0.05.
The RIP algorithm that we have tested (not necessarily optimal) is as follows: at t0 we integrate the initial ensemble to t1. Then the RIP loop with n = 1, is:
(a) Perform a standard EnKF analysis and obtain the analysis weights at tn, saving the mean square observations minus forecast,
computed by the EnKF.
(b) Apply the no-cost smoother to obtain the smoothed analysis ensemble at tn−1 by using the same analysis weights obtained at tn.
(c) Perturb the smoothed analysis ensemble with a small amount of random Gaussian perturbations, a method similar to additive inflation. These added perturbations have two purposes: they avoid the problem of otherwise reaching the same final analysis at tn as in the previous iteration (Figure 1), and they allow the ensemble perturbations to evolve into fast-growing directions that may not have been included in the unperturbed ensemble subspace‡.
(d) Integrate the perturbed smoothed ensemble to tn. If the forecast fit to the observations is smaller than in the previous iteration according to a criterion such as
then go to (a) and perform another iteration. If not, let tn−1←tn and proceed to the next assimilation window. In the results presented here, we have used ε = 0.05 as the criterion for relative improvement.
(e) If no additional iteration beyond the first one is needed, the RIP analysis is the same as the standard EnKF. When the system converges, no additional iterations are needed, so that if several assimilation cycles take place without invoking a second iteration, the RIP can be switched off and the system returns to a normal EnKF. As expected, we observed that the results are slightly degraded if the RIP continues to be executed after convergence due to overfitting the observations. In the results presented here, we switched off RIP after five cycles without invoking a second iteration.
The LETKF with the RIP method was implemented in the Rotunno and Bao (1996) QG model. The DA experiments are performed with a 12 h analysis cycle. The analysis is validated every 12 h against the truth simulation, a long nature run of this QG model. The validation is done through the RMS analysis error, defined as the domain-averaged RMS difference of the model variables (potential vorticity and temperature) between the analysis and truth. The ‘rawinsonde’ observations are vertical profiles of zonal and meridional wind components and temperature, generated by adding random Gaussian errors on the truth. Details of the QG DA setup can be found in Yang et al(2009). As indicated in section 2, step (c), we also added to the smoothed analysis ensemble Gaussian perturbations with size 0.01, small compared to the amplitude of the model natural variability and to the observation errors.
In this section we compare several DA methods started from the same randomly chosen mean. We measure the (real-time) spin-up by the number of cycles required to reduce the RMS error in potential temperature, which starts from a non-dimensional value of 0.76, to a level of 0.038, i.e. 5% of the initial analysis error (grey line in Figures 3(a), (b), (c)). The results, including both spin-up time and asymptotic level of analysis error are also summarized in Table I
Table 1. Comparison of the spin-up time (number of DA cycles to reduce the initial RMS error in potential temperature to 5% of the original value) and the asymptotic RMS error for LETKF ensembles with and without RIP, and fixing the number of RIP iterations to 10 rather than determining them adaptively.
LETKF (1) Random Uniform Initial Ensemble
LETKF (2) Random Gaussian Initial Ensemble
LETKF (3) B3D-Var Initial Ensemble
LETKF Random Initial Ensemble
Fixed 10 RIP iterations
The RMS error is averaged over 120 cycles after spin-up. When using RIP adaptively, the RMS error corresponds to the case where RIP is switched off after 5 cycles with only one iteration (step (e) of the RIP algorithm). Variational methods starting from the same initial state as the ensemble mean are also compared. The variational error covariances have been optimally tuned for both 3D-Var and 4D-Var.
RMS error (×10−2)
Figures 3(a), (b) and (c) show the RMS error of the analysis obtained during the spin-up, using several methods over 200 analysis cycles of 12 h each (corresponding to a total of 100 days). In Figure 3(a) we compare the number of cycles required for spin-up for the LETKF with initial random perturbations uniformly distributed, with and without using RIP (black), with 3D-Var and 4D-Var (grey). As indicated before, all the experiments started from the same randomly chosen mean state. 3D-Var (dashed grey line) takes about 60 cycles to spin up, and 4D-Var (full grey line) takes about 80 cycles, but converges to a much lower RMS error than 3D-Var. The standard LETKF (full black line) using the observations once and discarding them takes much longer, a total of 170 cycles. It is interesting that the LETKF devotes the first 120 cycles essentially to create ensemble perturbations representative of the ‘errors of the day’, with little reduction in the analysis mean error, and only then, between 120 and 170 cycles, does the LETKF converge rather quickly to the asymptotic level of error (the analysis accuracy that the LETKF with optimized parameters is able to reach). After spin-up, the LETKF and 4D-Var have a similar asymptotic RMS error but significant day-to-day differences. The LETKF with RIP for Case 1 takes about 80 cycles to reach the asymptotic level of error, with the same error level and number of cycles as 4D-Var.
Figure 3(b) compares the spin-up of the LETKF starting from a random uniform ensemble (Case 1), with and without RIP, as in Figure 3(a), with the LETKF started from perturbations drawn from the 3D-Var error covariance (Case 3), i.e. where each ensemble perturbation is a column of the matrix . Here E is an M × K matrix whose columns are random Gaussian numbers such that EET ≈ I, M is the dimension of the model and K is the number of ensemble members. It is apparent from Figure 3(b) that, as suggested by both Anderson (2008, personal communication) and Zupanski et al(2006), when the initial ensemble is drawn from the 3D-Var covariance matrix, the spin-up is much faster than when started from random, uniformly distributed perturbations. We can view Figure 3(b) as a comparison between the worst and best choices of the initial ensemble perturbations one could make, since in the first case we make no use of prior information and assume perturbations are non-Gaussian, and in the second we use best-tuned 3D-Var prior information. Nevertheless it is remarkable that, even in the case of faster spin-up, the application of the RIP algorithm is able to accelerate the spin-up even further.
Figure 3(c) compares the spin-up of the LETKF starting from 3D-Var covariance ensemble (Case 3) with and without RIP, as in Figure 3(b), and a Gaussian initial ensemble (Case 2), without any a priori information. Initializing from perturbations with 3D-Var structures spins up faster than using uncorrelated Gaussian perturbations, but such difference disappears when RIP is applied and similar results are obtained. Given that an optimally tuned 3D-Var covariance matrix may not be always available for ensemble-based DA systems, the use of RIP appears to be an attractive alternative.
In an additional experiment in which the LETKF RIP algorithm was forced to always perform 10 iterations (not shown), the LETKF showed an even faster spin-down but it converged to a higher level of error, close to that of 3D-Var (Table I). This is not surprising, since once the system is close to the maximum likelihood solution, as indicated by the theoretical arguments discussed above, observations should be used only once and then discarded. By performing 10 iterations even after the system spun up, the EnKF analysis fits the data too closely and this increases the analysis errors.
Finally, Figure 4 shows the number of iterations needed to accelerate the spin-up in the RIP algorithm when started with random initial ensemble perturbations and with the 3D-Var initial perturbations. One iteration in the figure corresponds to the normal LETKF case, i.e. when a second iteration would give a relative improvement in the fit of the forecast to the observations of less than ε = 0.05 (7), and thus it is not used. For the B3D-Var initial ensemble (Case 3) only 2 to 6 iterations are needed during the spin-up, but the other two ensembles without prior information need 11 iterations at cycle 19. The last second iteration is executed at cycle 65, 46 and 41 for ensembles (1), (2) and (3) respectively. After RIP is turned off, the analysis accuracy for the three ensembles is essentially identical (Table I).
We found that using a lower value of ε = 0.01 (not shown) leads to a faster initial reduction of errors but requires a large number of iterations. Values of ε within a range 0.02–0.05 gave optimal results, leading to a spin-down of the initial errors similar to 3D-Var and faster than 4D-Var, and converging to an error level at least as good as that of 4D-Var.
The RIP algorithm could be used to accelerate the spin-up of EnKF whenever the background-error statistics change suddenly due to strong nonlinear background instabilities, as in the case of a developing storm, or they are otherwise not appropriate, as when a regional model with high resolution is started from initial conditions from a global model, or when no prior information is available to start the ensemble. In our experiments, RIP is only required during the spin-up period and switches back to the regular LETKF afterwards. Heuristically, however, we have found that RIP is also useful when the underlying dynamics are very nonlinear (Yang and Kalnay, 2010). When a long assimilation window is used so that the ensemble perturbations cannot be assumed to evolve linearly, the use of RIP improves the analysis accuracy and helps the analysis ensemble remain more Gaussian.
The results obtained with RIP are very encouraging; it is possible to significantly accelerate the spin-up of the LETKF (and other EnKF algorithms for which the weights of the ensemble forecasts are available) when fast convergence to the optimal level of error (in terms of real time) is required by simply using the initial observations several times rather than only once. The no-cost Ensemble Kalman Smoother, with the smoothed analysis ensemble at the beginning of an assimilation window given by using the analysis weights of the ensemble forecast at the end of the window, enables this algorithm to extract more information from the initial observations than the regular EnKF. It is necessary to add small perturbations to the ensemble, in a procedure akin to additive inflation. The number of iterations needed is estimated by checking whether the smoothed analysis reduces the forecast error, derived from the innovation (the difference between observation and the background state). A level of relative reduction ε of about 2 to 5% was found to work well in this QG model, leading to about 2 to 6 iterations during spin-up. After the system converges it returns to the original LETKF.
We are grateful to Jidong Gao who pointed out the slow spin-up of EnKF in severe storm prediction using radar data, and to Brian Hunt for explaining why forecast weights computed at the end of the LETKF assimilation window are valid throughout a linear window. Two reviewers of the original note made extremely constructive comments and one of them (Jeff Anderson) suggested starting the ensemble from perturbations drawn from the 3D-Var error covariance. The thoughtful and stimulating guidance of the Associate Editor is also gratefully acknowledged. This research was generously supported by NASA grants NNG06GB77G, NNX08AD90G and NNG06GB77G, and DOE grant DEFG0207ER64437.
The no-cost smoother is a fixed-lag smoother within the assimilation window, (e.g. Ravela and McLaughlin, 2007). The only cost (in addition to the cost of performing an LETKF filter within the assimilation window, which can include observations distributed in space and time) is that of computing a weighted average of the K ensemble members each of size n at the times the smoother is needed. In our application we need the smoother at the beginning of the window, so the cost of the smoothed ensemble mean is just the cost of multiplying a matrix of size n × K by a weighting vector of size K, and the cost of the smoothed analysis perturbations is just the cost of multiplying a matrix of size n × K with a weighting matrix of size K × K. These costs are independent of the number of observations within the assimilation window, and of the window length, but the accuracy of the smoother, like the accuracy of the filter, will degrade when the ensemble perturbations grow nonlinearly, or in the presence of model errors.
We have tested the use of additive perturbations with a Gaussian and with a uniform distribution, and both worked well. The results presented here have Gaussian additive perturbations.