Maximum likelihood estimation of inflation factors on error covariance matrices for ensemble Kalman filter assimilation

Authors


Abstract

In the ensemble Kalman filter, the forecast error covariance matrix is estimated as the sampling covariance matrix of the forecast ensemble. However, it is well known that such estimations may be far from the true forecast error covariance matrix. In this paper, an inflation approach on forecast error covariance matrix based on the maximum likelihood estimation theory is developed and compared to an existing time-dependent inflation method and the best-tuned constant inflation. Our method was first tested on a 40-variable Lorenz model using spatially correlated observation errors. Specifically, when the observation error variance is incorrectly specified, our proposed method can simultaneously inflate on both forecast and observation error covariance matrices. We then assessed our approach on the two-dimensional Shallow Water Equation model with higher state dimensions and a larger correlated observation system. The results confirmed that our method is effective in retrieving the true states and correcting observation error variances. Copyright © 2011 Royal Meteorological Society

1. Introduction

The Ensemble Kalman Filter (EnKF) is a popular sequential data assimilation approach, which has been examined and applied in a number of studies since it was first introduced by Evensen (1994a, 1994b). Conceptually, by assimilating observations into the model, data assimilation can provide an optimal combination of model outputs and observations. However, the ‘optimality’ of the combination depends on the accuracy of the estimated forecast and observation error covariances; if the estimates are erroneous, the model updates will be suboptimal. Therefore, a correct description of forecast error and observation error covariances is crucial to the high analysis quality of EnKF (e.g. Evensen, 2003). Unfortunately, it is very difficult to have an exact idea of these statistical properties in practice, since the true states are never known and perfect statistics cannot be computed (Sénégas et al., 2001).

In EnKF, the forecast error covariance matrix is estimated as the sampling covariance matrix of the forecast ensemble. However, past work on EnKF found that the sampling error in such estimations, resulting from finite-size ensembles, can generally lead to underestimation of the forecast ensemble covariance and eventually result in filter divergence (e.g. Anderson and Anderson, 1999; Constantinescu et al., 2007). To compensate for this, covariance inflation for increasing ensemble variance is becoming popular. A simple and common way is to inflate the deviation from the forecast ensemble mean by a small constant factor for each ensemble member (Anderson and Anderson, 1999), in which the factor is chosen by repeated experimentation. In section 3.2, we compare our approach with this method.

Dee (1995) and his later work with colleagues (Dee and da Silva, 1999; Dee et al., 1999) proposed a maximum likelihood estimation method for forecast error and observation error covariances. This method first parametrizes the forecast error and observation error covariance matrices, and then estimates the parameters by minimizing the −2log(likelihood) of the observation-minus-forecast residuals. However, their work did not suggest a reliable parametrization for the forecast error covariance matrix, and what is more important, the computation of the determinant in the −2log-likelihood is very difficult in the general case. Consequently, the method has not become very popular for estimating forecast and observation error covariances.

Using statistics of the observation-minus-forecast residuals described in Dee (1995), Wang and Bishop (2003, hereafter denoted W-B) proposed a method of estimating inflation factors on-line in an Ensemble Transform Kalman Filter (ETKF). Building on their work, Li et al. (2009) came up with another algorithm to estimate the inflation factor at each analysis step. Anderson (2007, 2009) also proposed a temporally varying inflation method and later a spatially and temporally varying inflation method using a hierarchical Bayesian approach. However, the work of Li et al. (2009) and Anderson (2007, 2009) are limited to the case of uncorrelated observation errors. In this article, we investigate how to estimate inflation factors when observation errors are spatially correlated and observation error statistics may not be correctly known.

As an extended application of the maximum likelihood theory developed in Dee (1995) and Dee and da Silva (1999), Zheng (2009) proposed a ‘multivariate covariance inflation’ extending the inflation factor to a time-dependent diagonal matrix. However, only a simple model and independent observation errors were tested for that work. We made this study to further develop the work of Zheng, in that the inflation method was tested on more realistic models with much higher dimensions and using spatially correlated observation errors. We also detected the capability of our approach of simultaneously inflating both forecast error and observation error covariance matrices when the observation error variance is wrongly specified. However, the inflation factors were constrained as scalar ones in this article. In appendix B, we describe an efficient way of computing the determinant in the −2log-likelihood. However, it is worth noting that this effective method is practical only when the estimated inflation factors are scalar parameters and the forecast error covariance matrix is of small rank (fortunately, this is the case in our study for the scalar inflation of the forecast error covariance matrix in EnKF). For more complicated parametrizations of the error covariances, different techniques would have to be used to make the computations practical.

This paper comprises five sections. Section 2 summarizes the W-B method and our approach. Section 3 presents the assimilation results on a low-order Lorenz model (Lorenz, 1996). Especially, our approach is compared with the W-B method for cases in which the observation error variance is exactly known. Section 4 provides the validation on a more realistic two-dimensional Shallow Water Equation model with a larger correlated observation system. Conclusions and discussions are given in section 5. Hereafter, we denote our method building on the maximum likelihood estimation theory as MLE. It is also worth noting that in this article the expression ‘inflation factor’ does not specifically represent a factor larger than 1; it can also refer to a factor smaller than 1.

2. Assimilation algorithm

2.1. Ensemble Kalman filter (EnKF)

Following the notation proposed by Ide et al. (1997), a nonlinear time-discrete forecast and linear observation system can be represented as:

equation image(1)
equation image(2)

where i is time step index, equation image is the n-dimensional true state vector at time step i, Mi is the nonlinear forecasting operator, equation image is the pi -dimensional observation vector, Hi (linear in this paper) is the measurement matrix of dimension pi × n that maps the model states to the observed variables, ηi and εi are the model error and observation error vectors, which are assumed to be statistically independent of each other and time-uncorrelated, with zero mean and covariance matrix Qi and Ri respectively. The goal of Kalman filtering assimilation is to find a series of analysis states equation image that are sufficiently close to the true state equation image.

Suppose perturbed analysis states at last time step equation image (where j counts from 1 to the number of ensemble members m) are known and analysis state equation image is defined as ensemble mean of equation image, then the EnKF (Evensen, 1994a) includes the following steps.

  • Step (1). Run the full model forward in time to get the perturbed forecast states:

    equation image(3)
  • Step (2). Estimate the forecast error covariance matrix as:

    equation image(4)
  • Step (3). Calculate the perturbed observation-minus-forecast residuals as:

    equation image(5)

    where εi,j is normally distributed with zero mean and covariance matrix Ri.

  • Step (4). Compute the perturbed analysis states at time step i:

    equation image(6)
  • Step (5). If equation image is not the last observation, we set i = i + 1 and repeat the assimilation cycle from Step (1) to Step (4). Otherwise, the filtering ends.

2.2. Inflation on forecast error covariance matrix

In EnKF, the forecast error covariance matrix equation image is estimated as the sampling covariance matrix Pi (Eq. (4)) of the forecast ensemble (Burgers et al., 1998). However, if the ensemble size is small, sampling errors will be significant in such estimations and the ensemble covariances will underestimate the true error covariances. This may eventually lead to filter divergence, in which the ensemble spread decreases dramatically until the observations ultimately have an irrelevant impact on the analysis states (Whitaker and Hamill, 2002).

2.2.1. Wang and Bishop's inflation method

At an early stage, this problem was addressed by inflating the forecast ensemble with a factor larger than 1, but the factor is constant in time and chosen by repeated trials (e.g. Anderson and Anderson, 1999). Based on the moment estimation of a parameter in the statistics of observation-minus-forecast residuals di (Eq. (7)), Wang and Bishop (2003) proposed a time-dependent inflation algorithm.

equation image(7)

In their method, the time-dependent inflation factor λi is estimated as

equation image(8)

where pi is the number of observations, equation image is the inverse of the square root of Ri. More details are documented in appendix A.

2.2.2. Inflation method based on the maximum likelihood estimation theory

In this article, we investigated an alternative inflation scheme in which the inflation factor λi is estimated by minimizing the −2log-likelihood of di:

equation image(9)

where ‘det’ represents the determinant of the matrix. This method was developed by Zheng (2009) based on a general approach of estimating forecast and observation error statistics in Dee (1995) and Dee and da Silva (1999).

According to the demands of various studies, the inflation factor can also be time constant. In this case, the time-constant inflation factor λ is estimated by minimizing the total −2log-likelihood of di over the full assimilation time period:

equation image(10)

where −2Li (λ) is defined in Eq. (9).

After λi is estimated, the forecast error covariance matrix is assumed to be λiPi, which is equivalent to using the inflated forecast ensemble equation image (Eq. (11)) to calculate the forecast error covariance matrix defined in Eq. (4).

equation image(11)

Then according to Eq. (6), the perturbed analysis states equation image is calculated as:

equation image(12)

For practical applications, especially in situations with overwhelmingly large number of observations such as very high resolution satellite images, the calculation of equation image and equation image may be computationally expensive. Tippett et al. (2003) proposed an efficient solution for the inverse. For the determinant, we also find an effective way to compute it (see appendix B for details of these methods). Through avoiding the inverse and determinant of the prohibitively high-dimensional matrix, the computational cost of methods based on maximum likelihood estimation is notably reduced.

2.3. Inflation on both forecast and observation error covariance matrices

Previous studies have pointed out that the effective estimation of the inflation factor relies on the high accuracy of the specified observation error statistics (e.g. Li et al., 2009). In practice, however, it is difficult to know the observation error statistics exactly, and so the observation error covariance matrix may need to be inflated. In this section, we demonstrate that it is possible to adjust inflation factors and observation error statistics simultaneously by minimizing the −2log-likelihood of di, following the pioneering work of Dee and da Silva (1999).

For more details, we suppose that the observation error covariance matrix is parametrized as μRi, where μ (either constant or variable in time) can be regarded as the inflation factor of Ri. The maximum likelihood theory discussed in section 2.2.2 is easily generalized to estimate both λi and μ. When μ is time constant, μ is estimated by minimizing the total −2log-likelihood of di over the full time period:

equation image(13)

where time-dependent λi(μ) is estimated by minimizing the function of λ:

equation image(14)

When the number of observations is sufficiently large, we can suppose that μi is time dependent and simultaneously estimated with λi at every assimilation step by minimizing −2Li (λii) as defined in Eq. (14).

2.4. Validation statistics

In this study, the assimilation results were primarily evaluated by the root-mean-square error (RMSE) of the analysis ensemble:

equation image(15)

where n is the number of model variables, equation image is the jth member of the analysis ensemble for the kth variable at time step i. The smaller the analysis RMSE, the better the assimilation.

Following Anderson (2007), we define:

equation image(16)
equation image(17)

where equation image is the jth member of the inflated forecast ensemble (Eq. (11)) for the kth variable at time step i. Roughly speaking, if equation image and equation image are identically distributed with mean value equation image, the background RMSE and background spread should be consistent with each other.

3. Results with the 40-variable Lorenz model

First, our method was evaluated on the strongly nonlinear Lorenz model (Lorenz, 1996). The governing equation is

equation image(18)

where j = 1,2,…,J and x−1 = xJ−1, x0 = xJ, xJ+1 = x1. This model is designed to mimic the time evolution of an unspecified scalar meteorological quantity x at J equally spaced grid points on a circular domain which may be thought of as a latitude circle. Eq. (18) is solved using a fourth-order Runge–Kutta integration scheme with a time step of 0.05 non-dimensional units, which may be thought of as nominally equivalent to 6 h in real-world time.

In this experiment, J = 40 and F = 8. Under these parameter settings, the system behaves chaotically (Lorenz and Emanuel, 1998). The ‘true state’ was generated by integrating the model for 2000 steps. Synthetic observations were available at every model grid point and generated by adding random noises which are multivariate- normally distributed with zero mean and covariance matrix Ri to the ‘true state’. Note that Ri is not diagonal in this study; therefore, the random noises (observation errors) at different observation sites are correlated with each other. We assimilated the observations every 4 steps for 2000 steps but only reported the results for the last 1000 steps (the first 1000 steps are treated as spin-up). A small ensemble with 30 members was used. To simulate the ‘model error’ in forecasts of imperfect models, we used F = 5 during the assimilation process unless otherwise specified.

The leading-diagonal elements of Ri are equation image and the off-diagonal elements are f(d,ρ),where

equation image(19)

Since the 40 observations are spaced in a circular domain, d is defined as the shortest distance between the ith and jth observation sites. 0 < ρ < 1 is a scalar to guarantee that the correlation between two observation locations decreases as d increases. In this experiment, ρ is set to 0.5.

3.1. Maximum likelihood estimation of the time-dependent inflation factor

3.1.1. Comparison to the EnKF without inflation

The analysis RMSE and −2log-likelihood of di for the EnKF without inflation and our proposed approach are plotted in Figure 1. The −2log-likelihood and, accordingly, the analysis RMSE in our approach are notably reduced relative to those in EnKF. The majority of estimated inflation factors vary between 1 and 5. In addition, as shown in Figure 2, the ratio of ‘background RMSE’ to ‘background spread’ is much larger than 1 in the EnKF without inflation, indicating that the forecast ensemble spread is underestimated. Applying the time-dependent inflation technique in our approach, the underestimation of background spread is effectively compensated and therefore produces an improved agreement between the background RMSE and background spread.

Figure 1.

(a)The inflation factors λi, (b) −2log-likelihood, (c) analysis RMSE for the EnKF without inflation (solid line) and our approach (MLE) with time-dependent inflation (dotted line). Observation errors are correlated. Ensemble size is 30, and F = 5.

Figure 2.

The ratio of background RMSE to background spread for the EnKF without inflation (solid line) and our approach (MLE) with time-dependent inflation (dotted line). Observation errors are spatially correlated. Ensemble size is 30, and F = 5.

3.1.2. Comparison to Wang and Bishop's inflation method

Although both our approach and the W-B method adopt the statistics of di to estimate time-dependent inflation factors, these two methods are different. The former minimizes the −2log-likelihood of di (Eq. (9)), while the latter uses moment estimation (Eq. (8)). In this section, we compare these two methods using the 40-variable Lorenz model with a set of model errors.

The Lorenz model is a forced dissipative model with a parameter F that controls the strength of the forcing (Eq. (18)). The model behaves quite differently with different values of F and produces chaotic systems with integer values of F larger than 3. As such, we used a set of values of F to simulate a wide range of model errors. In all cases, the true states and observations were generated by a model with F = 8. These observations were then assimilated into models with F = 4,5,…,12.

Figure 3 plots the time-mean analysis RMSE of the two methods averaged over 1000 steps, as a function of F. Overall, the analysis RMSE of both methods gradually grows when the model error is increased. When F is around the optimal value 8, W-B and MLE with time-dependent inflation have almost indistinguishable values of analysis RMSE. However, when F becomes increasingly distant from 8, the analysis RMSE of MLE becomes progressively smaller than that of W-B.

Figure 3.

Time-mean values of analysis RMSE for the W-B method and our proposed approach (MLE) with time-dependent inflation, as a function of forcing F. Observation errors are spatially correlated. Ensemble size is 30.

3.1.3. Observation error variance is incorrectly specified

In this experiment, the true value of observation error variance is equation image, but it is erroneously assigned to equation image. Our proposed inflation scheme was performed with the correctly and incorrectly assigned observation error statistics respectively, and the results are summarized in Table I. It shows that relative to Case 1 (with true equation image), Case 2 (with wrong equation image) has larger analysis RMSE, indicating that assimilation quality is associated with the accuracy of the specified observation error statistics. Meanwhile, −2log-likelihood for the correct observation error variance is also smaller than that for the incorrect variance.

Table I. The time-mean values of inflation factor λi, −2log-likelihood and analysis RMSE in three cases as well as constant μ in Case 3, for time-dependent inflation.
 time-mean λi−2log-likelihoodanalysis RMSEμ
  1. Case 1: using true observation error variance and only estimating λi; Case 2: using wrongly specified observation error variance and only estimating λi; Case 3: using wrongly specified observation error variance and estimating λi and μ. Results are for the imperfect situation in which F = 5 with spatially correlated observation errors and a 30-member ensemble.

Case 1: True equation image, only λi is estimated3.0197.671.25
Case 2: Wrong equation image, only λi is estimated1.90109.691.79
Case 3: Wrong equation image, λi and μ are estimated2.8596.831.310.3

To mitigate the problem arising from wrongly assigned observation error variance, we parametrized the observation error variance as equation image, and simultaneously estimated the inflation factors λi and time constant μ by minimizing equation image −2Lii(μ),μ} (Eq. (13)) as documented in section 2.3. The results are also shown in Table I. By introducing μ (Case 3), the observation error variance is corrected to 1.2 (0.3 × 4) which is close to the true equation image and leads to a time-mean analysis RMSE similar to that in Case 1. However, there is unavoidable estimation error in the statistical sense. In addition, strictly speaking, equation image −2Lii(μ),μ} is the −2log-likelihood only if di are independent of each other (i.e. white), which is the optimality condition for the filter. In a nonlinear system, however, especially when model errors are large, di could have significant correlations in time. These factors may explain to some extent why the wrong observation error variance is not accurately corrected to the true variance.

3.2. Maximum likelihood estimation of the time-constant inflation factor

When the true states are available, one should be able to find the best constant inflation factor by repeated trials which lead to the assimilation with least analysis error (analysis RMSE). This approach is referred to as the best tuned inflation here (e.g. Anderson and Anderson, 1999). For practical applications, the true states are not known and such tuning is not applicable. In this section, we estimated the time constant λ by minimizing the total −2log-likelihood of di over the full time period equation image −2Li(λ) (Eq. (10)) and compared the results to those obtained using the best tuned inflation method.

The results for both methods in imperfect model scenarios with F = 5 and F = 12 are shown in Table II. In the presence of large model errors, the MLE with time-constant inflation has quite similar but still slightly higher analysis RMSE and background RMSE, relative to the best tuned inflation. However, the inflation factors in MLE with time-constant inflation are smaller and thus the background spread is closer to the background RMSE, compared with those of the best tuned inflation. This indicates that the inflated forecast error covariances in MLE may not sufficiently account for the structure of the ‘true’ large model errors. In this case, as Anderson (2007) and Li et al. (2009) analysed, the best tuned inflation behaves better by overinflating the forecast error covariances and thus giving heavier weight to the observations, which acts to reduce the impact of the model and overcome the errors in representing the model bias. In addition, as stated in section 3.1.3, in the strict sense equation image −2Li(λ) is not the −2log-likelihood because di may have large correlations in time. This may also slightly degrade the performance of MLE with time-constant inflation.

Table II. The time-constant inflation factor λ, time-mean values of −2log-likelihood, analysis RMSE, background RMSE and background spread for the best tuned inflation and MLE with time-constant inflation.
 F = 5F = 12
 best tunedMLE with time-constantbest tunedMLE with time-constant
 inflationinflationinflationinflation
  1. The results are for imperfect model scenarios with F = 5 and F = 12 in the presence of observation error correlations. Ensemble size is 30.

λ6.13.79.76.8
−2log-likelihood106.2192.73143.86132.88
analysis RMSE1.121.141.391.44
background RMSE1.281.311.611.68
background spread2.091.333.612.64

To further compare the performances between time-dependent and time-constant inflation, Table III displays the time-mean results of MLE with time-dependent inflation. In imperfect situations with large model errors, although time-dependent inflation shows better consistency between background RMSE and background spread, the analysis RMSE and background RMSE are higher than those of the best tuned inflation method (Table II). In addition, compared to MLE with time-constant inflation (Table II), MLE with time-dependent inflation also produces larger time-mean values of −2log-likelihood and, correspondingly, higher analysis RMSE. This case suggests that the time-dependent inflation method is not always superior to the time-constant inflation method. Both have their advantages and disadvantages. In the time-constant inflation approach, the inflation factor is tuned and balanced over the entire assimilation time period so that the interaction of analysis states at different steps is optimized. Unlike that, the time-dependent inflation method computes the inflation factor at every analysis step, but ignores the performance of forecast states at later time steps. However, the repeated tuning in the time-constant inflation method has to do repeated ensemble forecasts and so requires more computational cost than the time-dependent inflation method.

Table III. The time-mean values of inflation factor λi, −2log-likelihood, analysis RMSE, background RMSE and background spread for MLE with time-dependent inflation.
 F = 5F = 12
 MLE withMLE with
 time-dependenttime-dependent
 inflationinflation
  1. The results are for imperfect model scenarios with F = 5 and F = 12 in the presence of observation error correlations. Ensemble size is 30.

time-mean λi3.04.9
− 2 log -likelihood97.67143.42
analysis RMSE1.251.69
background RMSE1.431.95
background spread1.051.87

The aforementioned results for the Lorenz model used an ensemble with 30 members. In fact, we have examined the sensitivity of the inflation methods to different ensemble sizes. It is shown that using a 20-member ensemble increases the time-mean analysis RMSE to almost twice while using a 40-member ensemble slightly reduces the analysis RMSE by 10%, relative to that using a 30-member ensemble. Ensembles less than 10 became unstable and no significant changes occurred for ensembles more than 50. Regardless of the problems related to overestimating distant covariances and under-representing the real physical dimension, we feel that a 30-member ensemble is necessary to estimate statistically robust forecast error covariances.

4. Results in the two-dimensional Shallow Water Equation model

As an extension of our previous work, the following experiment was made on a more realistic two-dimensional Shallow Water Equation model (SWE). The barotropic nonlinear SWE takes the following form (Lei and Stauffer, 2009):

equation image(20)

where the fluid velocity components u in x directions and v in y directions as well as the fluid depth h are the model variables, g is the gravity acceleration 9.8 m s−2, f is the Coriolis parameter defined as constant 10−4 s−1 using the f-plane approximation, and the diffusion coefficient k is given as 104 m2s−1. L and D are the domain dimensions of model integration, which are set to 500 km and 300 km respectively. The model is discretized with a uniform grid spacing of 10 km in x and y directions and integrated using the Lax–Wendroff method with a time step of 30 s. Thus, SWE has a dimension of 50 × 31 for each state variable. Periodic boundary conditions are used at the x boundaries, and a free-slip rigid wall boundary condition, where u and h are defined from the values one point inside the boundary, is used at the y boundaries. The initial height (depth) field is given by:

equation image(21)

where H0, H1 and H2 are set to 50.0 m, 5.5 m and 3.325 m respectively. The initial velocity field is derived from the initial height field with the geostrophic relation.

In our experimental design, SWE is integrated for 72 hours and the results are taken as the ‘true states’. We then set up synthetic observations of u, v and h at 310 randomly located grid points for every 3 hours in a 48-hour assimilation period. A 24-hour forecast is run after the assimilation. The observation error variances are specified as equation image m2s−2, equation image m2s−2, equation image m2. The observation errors of each variable are spatially correlated and ρ (Eq. (19)) is set to 0.5, but observation errors of different variables are assumed to be independent of each other. To simulate ‘model error’, we used k = 5 × 104 m2s−1 in the assimilation. Ensemble size is 100.

Figure 4 displays the inflation factors, analysis RMSE of u and h for the EnKF without inflation and our approach with time-dependent inflation assuming that the observation error variance is true. The results of v are qualitatively similar to those of u. It is seen that the maximum likelihood estimation in our method is successful in obtaining lower analysis RMSE than the EnKF without inflation in the 48-hour assimilation and subsequent 24-hour forecast periods.

Figure 4.

(a) Inflation factors λi, analysis RMSE of (b) u velocity and (c) height h for the EnKF without inflation (solid line) and our approach (MLE) with time-dependent inflation (dashed line). Observation errors are spatially correlated.

When the observation error variances of u, v and h are erroneously specified (equation image m2s−2, equation image m2s−2, equation image m2 here), we simultaneously estimated the inflation factors λi and time dependent μi at each analysis step by minimizing −2Li(λii) (Eq. (14)). Unlike the constant μ used in the Lorenz model, μi is time variable in this SWE model. The results of u, v and h are qualitatively similar, so only those of h are plotted in Figure 5. When the observation error variances are wrongly assigned as 10 times the true variances, our method based on MLE successfully retrieves the true observation error variances and results in essentially the same values of analysis RMSE as those obtained using true variances. We also attempted to apply time-dependent μi to the Lorenz model, but the estimated μi showed large oscillations and led to an unsatisfactory analysis, perhaps due to the insufficient number of observations. Using the SWE model which has many more observations (310 for each variable), we can see here that the wrong observation error variance is efficiently and stably corrected to the true variance.

Figure 5.

(a) Observation error variance and (b) analysis RMSE of height h in three cases: using true observation error variance and only estimating time dependent λi (thin solid line), using wrongly specified observation error variance and only estimating time-dependent λi (thick solid line), and using wrongly specified observation error variance and simultaneously estimating time-dependent λi and μi (dotted line).

5. Discussions and conclusions

In the Ensemble Kalman Filter (EnKF), the forecast error covariance matrix is estimated as the sampling covariance matrix of the forecast ensemble. However, it is well known that such estimations may be far from the true forecast error covariance matrix. In this situation, the analysis results derived from EnKF may be of poor accuracy. In this article, this point is further confirmed.

A popular way to improve the estimation of the forecast error covariance matrix is the covariance inflation, whether time-dependent or time-constant. In this article, an inflation approach on forecast error and observation error covariance matrices based on minimizing the −2log-likelihood of observation-minus-forecast residuals di is studied. This idea was originally proposed in Dee (1995) and his later work with colleagues (Dee and da Silva, 1999; Dee et al., 1999). Our study indicates that inflating forecast error and observation error covariance matrices could be a reliable way to advance Dee's idea.

Wang and Bishop (2003) also proposed a method of on-line estimation of inflation factors based on the statistics of di. However, they used moment estimation, not the maximum likelihood estimation. In this paper, the W-B method and our proposed approach were tested using the 40-variable Lorenz model. The results suggested that, in this case, the two methods are almost equivalent for smaller model errors, but maximum likelihood estimation may perform better for larger model errors. A main advantage of the W-B method is that it is more computationally efficient. However, as we have shown in this article, maximum likelihood estimation may also be applicable with current computational power. A major advantage of maximum likelihood estimation is that both inflation factors and parameters of the observational error covariance matrix can be estimated simultaneously, following the idea of Dee and da Silva (1999). As a result, the estimation of inflation factor may not be too sensitive to the accuracy of observation error statistics. However, this is not the case for the W-B estimation.

We further validated our approach on a more realistic two-dimensional Shallow Water Equation model with much larger state dimensions and different sets of spatially correlated observations. The results confirmed that simultaneous inflation on forecast and observation error covariance matrices based on the maximum likelihood estimation is capable of retrieving the true observation error variance and producing satisfactory assimilations. This may manifest the potential of our approach in practical applications. Li et al. (2009) also proposed a method to simultaneously compute the inflation factor and observation error variance at each analysis step. However, their work only addressed the issue of spatially independent observation errors.

Despite the success of MLE with the Lorenz and SWE models, there are some fundamental limitations in this method, as discussed in Dee and da Silva (1999). First is a possible lack of identifiability of the estimated parameters λi andμi. Simultaneous estimation of multiple parameters in MLE is possible only when all parameters are jointly identifiable; in other words, the −2log-likelihood must have a unique global minimum. This requires that the characteristics of the observation errors and forecast errors are distinguishable in the observation space. However, to the extent that the observation errors and forecast errors in this article have similar characteristics (e.g. spatial correlations), it is difficult to estimate λi and μi separately based on the residuals di alone. In fact, identifiability is not specifically connected with the maximum likelihood method, but is a general problem for any parameter estimation method based on minimizing a cost function. The other limitation concerns the approximations involved in the MLE method. The true covariance model may not be expressible as the form of λiPi or μiRi for any values of λi or μi. Even if the covariance model is appropriate, the parameter estimates may be far from optimal because, for example, the model bias is not properly accounted for. The practical consequence of these limitations is that one does not always get results that are consistent with theory (e.g. the results in Tables I–III). Detailed analysis of these limitations and possible solutions can be found in Dee and da Silva (1999) and Dee etal. (1999).

Last but not least, it is assumed in this study that the inflation factor is constant in space. Apparently this is not the case in many practical applications, especially when the observations are unevenly distributed. Persistently applying the same inflation values that are reasonably large to address problems in densely observed areas to all state variables can systematically overinflate the ensemble variances in sparsely observed areas. If left unchecked, this will likely lead to unreasonable or even unacceptable output. To solve this problem, Zheng (2009) extended the MLE method from a single inflation factor to multivariate inflation, but the method has not been checked using relatively reliable models. Anderson (2009) also proposed an approach allowing inflation factors to vary both in time and space. However, his work is limited to the cases in which the observation error statistics are correctly known and independent of each other. In the future, we will focus on developing a time-and-space dependent inflation technique in the presence of observation error correlations and testing its capability in real applications.

Appendix A: Inflation Method Proposed by Wang and Bishop

Suppose that the forecast errors and observation errors are uncorrelated; the following relationship is satisfied:

equation image(A1)

where 〈 〉 represents the mathematical expectation operator. The vector di in this operator is regarded as a random vector.

As stated in Wang and Bishop (2003), for the global observational networks the number of independent scalar elements within equation image is large, thus equation image distributes closely around its mean value equation image. With this assumption, Eq. (A1) becomes:

equation image(A2)

Consequently, λi can be estimated as is documented in Eq. (8).

Appendix B: Efficient Calculation in Maximum Likelihood Estimation

1. Efficient calculation ofequation image

Provided that Ri−1 is inexpensive to obtain, equation image can be calculated using the Sherman–Morrison–Woodbury identity (Golub and Van Loan, 1996) as is proposed in Tippett et al. (2003):

equation image(B1)

where the n × m matrix Zi is the square root of Pi:

equation image(B2)

For ensemble filters, Zi can be taken as:

equation image(B3)

In this way, only an m × m matrix need be inverted and thus the computational cost is significantly reduced.

2. Efficient calculation ofequation image

equation image can be calculated efficiently using the singular value decomposition of the pi × m matrix equation image:

equation image(B4)

where Ui and Vi are orthogonal matrices, Di is a pi × m matrix whose leading-diagonal elements are the singular values of equation image. Then we have:

equation image(B5)

where equation image is the square root of Ri, I is a pi × pi identity matrix, and di,j is the jth diagonal element of Di.

In this way, only the determinant of Ri needs to be computed. In real applications, Ri is usually assumed to be diagonal and/or constant in time, or varies linearly and much less frequently than the forecast error covariance matrix Pi. Therefore, we usually only need to calculate det(Ri) a few times or even just once.

Acknowledgements

This work was supported by National Program on Key Basic Research Project of China (Grant No. 2010CB951604), National High-tech R&D Program of China (Grant No. 2009AA122104), and the National Natural Science Foundation of China General Program (Grant No. 40875062). The authors gratefully acknowledge the two anonymous reviewers for their constructive and relevant comments, which helped greatly in improving the quality of this manuscript. The authors are also grateful to the editors for their hard work and suggestions on this manuscript. Thanks also go to Peggy Chen and Yanyan Zheng for their editorial comments.

Ancillary