## 1. Introduction

[2] The four-dimensional variational data assimilation (4DVAR) method [*Johnson et al.*, 2006; *Kalnay et al.*, 2007; *Tsuyuki and Miyoshi*, 2007] has been a very successful technique used in operational numerical weather prediction (NWP) at many weather forecast centers [*Bormann and Thepaut*, 2004; *Park and Zou*, 2004; *Caya et al.*, 2005; *Bauer et al.*, 2006; *Rosmond and Xu*, 2006; *Gauthier et al.*, 2007]. The 4DVAR technique has two attractive features: (1) the physical (forecast) model provides a strong dynamical constraint, and (2) it has the ability to assimilate the observational data at multiple times. However, 4DVAR still faces numerous challenges in coding, maintaining and updating the adjoint model of the forecast model and it requires the linearization of the forecast model. Usually, the control variables (or initial states) are expressed implicitly in the cost function. To compute the gradient of the cost function with respect to the control variables, one has to integrate the adjoint model, whose development and maintenance require significant resources, especially when the forecast model is highly nonlinear and the model physics contains parameterized discontinuities [*Xu*, 1996; *Mu and Wang*, 2003]. Many efforts have been devoted to avoid integrating the adjoint model or reduce the expensive computation costs [*Courtier et al.*, 1994; *Kalnay et al.*, 2000; *Wang and Zhao*, 2005], Nevertheless, the tangent linear model of the forecast model is still required in all these methods. On the other hand, the usual ensemble Kalman Filter (EnKF) [e.g., *Evensen*, 1994, 2003; *Kalnay et al.*, 2007; *Beezley and Mandel*, 2008; also see Appendix A] has become an increasingly popular method because of its simple conceptual formulation and relative ease of implementation. For example, it requires no derivation of a tangent linear operator or adjoint equations, and no integrations backward in time. Furthermore, the computational costs are affordable and comparable with other popular and sophisticated assimilation methods such as the 4DVAR method. By forecasting the statistical characteristics, EnKF can provide flow-dependent error estimates of the background errors using the Monte Carlo method, but it lacks the dynamic constraint as in 4DVAR. *Heemink et al.* [2001] developed a variance reduced EnKF method by using a reduced-rank approximation technique to reduce the huge amount of computer costs. *Farrell and Ioannou* [2001] also proposed a reduced-order Kalman filter by the balanced truncation model-reduction technique. *Uzunoglu et al.* [2007] modified a maximum likelihood ensemble filter method [*Zupanski*, 2005] through an adaptive methodology. Generally, these three methods mentioned above belong to the Kalman filters. *Vermeulen and Heemink* [2006] have attempted to combine 4DVAR with EnKF; however, the tangent linear model is still needed in their method. How to retain the two primary advantages of the traditional 4DVAR while avoiding the need of an adjoint or tangent linear model of the forecast model has become a roadblock in advancing data assimilation. Recently, Qiu et al. [*Qiu and Chou*, 2006; *Qiu et al.*, 2007a, 2007b] proposed a new method for 4DVAR (more details below) using the singular value decomposition (SVD) technique based on the theory of the atmospheric attractors. *Cao et al.* [2007] have applied the proper orthogonal decomposition (POD) technique [*Ly and Tran*, 2001, 2002; *Volkwein*, 2008; also see Appendix C] to 4DVAR to reduce the forecast model orders while reducing the computational costs, but the adjoint integration is still necessary in their method. *Luo et al.* [2007] also applied the POD technique to the tropical ocean reduced gravity model.

[3] Here we resort to the idea of the Monte Carlo method and the POD technique. The basic idea of the POD technique is to start with an ensemble of data, called *snapshots*, collected from an experiment or a numerical procedure of a physical system. The POD technique is then used to produce a set of base vectors which span the snapshot collection. The goal is to represent the ensemble of the data in terms of an *optimal* coordinate system. That is, the snapshots can be generated by a smallest possible set of base vectors. On the basis of this approach, an explicit new 4DVAR method is proposed in this paper: it begins with a 4-D ensemble obtained from the forecast ensembles at all times in an assimilation time window produced using the Monte Carlo method. We then apply the POD technique to the 4-D forecast ensemble, so that the orthogonal base vectors cannot only capture the spatial structure of the state but also reflect its temporal evolution. After the model status is expressed by a truncated expansion of the base vectors obtained using the POD technique, the control variables in the cost function appear explicit, so that the adjoint or tangent linear model is no longer needed.

[4] Our new method was motivated by the need to merge the Monte Carlo method into the traditional 4DVAR to transform an implicit optimization problem into an explicit one. Our method not only simplifies the data assimilation procedure but also maintains the two main advantages of the traditional 4DVAR. This method is somewhat similar to Qiu et al.'s SVD-based method (referred to as SVD-E4DVAR hereafter, see Appendix B for details) because they both begin with a 4-D ensemble obtained from the forecast ensembles. However, they differ significantly in several aspects as discussed in section 2.2. *Hunt et al.* [2004], *John and Hunt* [2007] and *Szunyogh et al.* [2008] also developed a 4-D ensemble Kalman filter that infers the tangent linear model dynamics from the ensemble instead of the tangent-linear map as done in the traditional 4DVAR, in which the model states are expressed by the linear combinations of the ensemble samples directly rather than some orthogonal base vectors of the ensemble space. This method is also largely Kalman filtering, with the generation of its ensemble space being different from our method.

[5] We conducted several numerical experiments using a one-dimensional (1-D) soil water equation and synthetic observations to evaluate our new method in land data assimilation. Comparisons were also made between our method, SVD-E4DVAR [*Qiu and Chou*, 2006; *Qiu et al.*, 2007a, 2007b], traditional 4DVAR, and EnKF. We found that our new ensemble-based explicit 4DVAR (referred to as POD-E4DVAR) performs much better than the usual EnKF method in terms of both increasing the assimilation precision and reducing the computational costs. It is also better than the traditional 4DVAR and SVD-E4DVAR, especially when the forecast model is not perfect and the forecast error comes from both the noise of the initial field and the uncertainty in the forecast model. We also evaluate this approach using the Lorenz model. The corresponding assimilation experiments show that POD-E4DVAR can adjust the forecast state to approach the true Lorenz curve rapidly only by assimilating the observations twice in an assimilation cycle, which indicates its potential applications in other fields.