## 1. Introduction

[2] The CO_{2} concentration in the atmosphere is increasing every year because of anthropogenic activities. About half of the CO_{2} released to the atmosphere is absorbed by various land and ocean processes, but spatial and temporal variability of these carbon sinks is not well understood [*Denman et al.*, 2007]. When making policy decisions about CO_{2} emissions, it is important to know the spatial distribution of these carbon sinks, how they function, and for how long they will keep operating.

[3] Inverse modeling has been widely used to locate the spatial distribution of the carbon sink by using observed CO_{2} concentrations in the atmosphere [e.g., *Gurney et al.*, 2002; *Rödenbeck et al.*, 2003; *Michalak et al.*, 2004; *Bruhwiler et al.*, 2005; *Peters et al.*, 2005; *Baker et al.*, 2006]. The outcome of the inversions varies because of differences in the transport and the spatial representation of the prior fluxes. It depends strongly on the prescribed prior and observation error covariance matrices, which define weighting between the priors and the data for these under-determined problems.

[4] *Gurney et al.* [2002] introduced a model intercomparison experiment, which is widely known as TransCom3, with 16 global atmospheric transport models and model variants. They found a terrestrial carbon sink that is distributed almost evenly among the northern hemispheric continents. The magnitude of the sink was sensitive to transport differences among models. They also found that the CO_{2} uptake in the southern ocean was less than calculated from ocean measurements, and this result was not sensitive to the transport models. Early inversions were carried out by dividing the globe into several large regions and by solving for fluxes in monthly or annual time scales. Optimization was done by estimating a single vector of unknowns. This technique is known as a “batch mode” inversion. For example, in the TransCom3 experiment, the globe was divided into 22 regions, which consisted of 11 land regions and 11 ocean regions [*Gurney et al.*, 2002]. Large regions were used because of the sparseness of the CO_{2} observation sites. One advantage of this technique is that the problem is mathematically over-determined, because the number of unknowns (number of regions times number of time levels) is much less than the available observed information. However, a recent study by *Bruhwiler et al.* [2007] indicates that some regions like South America and Africa are poorly constrained by the current observation network because of the weaker signal from these regions. The batch problem is computationally efficient, even for monthly estimation over many years. However, lumping small basis regions into larger combined regions may lead to aggregation errors [*Kaminski et al.*, 2001; *Engelen et al.*, 2002], because observed CO_{2} fields are sensitive to the distribution of sources and sinks within large basis regions. Batch inversion for large regions cannot adjust these finer scale patterns of fluxes, so that errors in subregional spatial or temporal patterns are unavoidably aliased into errors in the mean fluxes. Usually the sampling sites are biased toward the fluxes from nearby grid cells and hence cannot properly represent heterogeneous larger regions. Also in the Transcom3 experiment, the spatial distributions of fluxes within the large source regions were demarcated by hard boundaries, which do not exist in the real situation.

[5] In a Bayesian framework for data assimilation, we optimize a cost function, which consists of two components. Mathematically we define the cost function as

where ** y** is a vector of observations,

*H*is an observation operator,

**is a vector of unknowns (the state vector we are solving for),**

*β*

*β*_{b}is the prescribed prior (background) estimate,

**is the observation error covariance matrix, and**

*R*

*P*_{f}is the forecast (prior) error covariance matrix. The first term of the cost function (equation (1)) controls the difference between the observations and the predicted values. The second term constrains the solution by an a priori (or “background”) flux distribution, which is necessary to stabilize the solution in an under-constrained problem. From a statistical point of view, the cost function is the kernel (core of the distribution that depends on the variable

**) of the posterior distribution. The posterior distribution is defined as a product of the likelihood function and the prior distribution. The two terms of the cost function are the kernel of the likelihood function and the kernel of the prior distribution, respectively. Here we find an optimal solution for the variable**

*β***by maximizing the posterior distribution. This corresponds to the minimizing kernel of the posterior distribution or the cost function. A solution which minimizes the above cost function can be found assuming a linear observation operator (**

*β***) as**

*H*where is the posterior estimate of the state vector ** β** and is its corresponding posterior covariance [

*Tarantola*, 1987].

[6] In large region inversions like the TransCom3 experiment, it is assumed that grid points within a given region are perfectly correlated in space with a constant flux value over some period of time (e.g., monthly). In finer-scale (grid-scale) inversions, if we were to assume that grid boxes are uncorrelated, the number of unknowns becomes extremely large compared to the number of observations. Hence the problem becomes under-determined, but can be solved by the priors. The best practical approach to solving the problem lies between these two extremes: perfectly correlated larger regions and uncorrelated grid boxes. In a grid-scale inversion, we find a solution, which lies in between these two extremes, by correlating the grid cells. The first grid-scale inversion of CO_{2} was introduced by *Kaminski et al.* [1999]. They estimated a coarse grid of fluxes at 8° latitude by 10° longitude in monthly time scales. The problem was highly under-constrained and a unique solution was found by gathering a priori information on surface fluxes. *Rödenbeck et al.* [2003] performed a grid-scale inversion accounting for the spatial covariance of flux uncertainties. They assumed different de-correlation length scales over the land and the ocean and monthly fluxes were estimated on 8° latitude by 10° longitude spatial resolution from monthly mean observations. *Michalak et al.* [2004] developed a geostatistical approach, which avoids prescribing a priori fluxes. In their method, *β*_{b} in equation (1) was replaced by a trend term ** Xβ**. This modification to the cost function allows one to include additional information such as vegetation cover, leaf area index, and greenness fraction etc. that varies with the mean behavior of the fluxes. For example, if we assume that the mean behavior of ocean fluxes differs from that of the land, it can be incorporated into the trend term by simply including a variable, which separates the two fluxes. This separation is usually done by including an indicator variable that represents the land by 1 and the ocean by 0 or vise versa. They estimated the parameters of the state covariance matrix, such as de-correlation length scales and variances (land/ocean), as a byproduct of the optimization scheme, rather than prescribing them.

[7] All of these methods utilize the batch mode or synthesis inversion technique and they perform satisfactorily with the existing observation network. Every year new observation sites become available, many of which record CO_{2} hourly rather than weekly. The observation vector will be tremendously large when Orbiting Carbon Observatory (OCO) data are available [*Crisp and Johnson*, 2005]. As more observations become available, we will be able to optimize the fluxes in much finer scales. However, batch mode inversions are unwieldy in this case because of the need to invert excessively large matrices. Finer-scale estimation of surface sources and sinks is now becoming feasible, but the computational burden and under-constrained nature of the problem requires innovative assimilation methods. *Bruhwiler et al.* [2005] introduced a fixed-lag Kalman smoother to estimate fluxes. Their method steps through the observations sequentially, which avoids the difficulties of using large observation vectors as in the batch mode inversion technique. However, this method requires the pre-calculation of observation operators, which is still expensive in the case of assimilating hourly observations. Further developing the fixed-lag Kalman smoother, *Peters et al.* [2005] introduced an ensemble-based approach to carbon inversions, in which the Kalman gain matrix was approximated by using ensemble members. They used the ensemble square root filter, which assimilates observations serially (one at a time) [*Whitaker and Hamill*, 2002]. Serial assimilation could be troublesome in carbon problems because of the need for repeated integration of the transport model. This could be computationally expensive with very large observation vectors as in the case of a satellite experiment. *Baker et al.* [2006] and *Chevallier et al.* [2005] introduced variational data assimilation schemes to atmospheric CO_{2} assimilation. Their methods also showed promising results with large observation vectors such as OCO data. However, the variational method requires the calculation of backward-in-time transport, also known as the model adjoint. Frequent improvements to the models are introduced so that the computation and maintenance of model adjoints, which is required in variational methods, also becomes complicated and troublesome. For example, in atmospheric transport models, reversing advection schemes is fairly simple but reversing the convective schemes can be rather difficult because of complicated parameterization schemes with many logical branches. Ensemble methods have the advantage that there is no need to compute model adjoints. The computational cost of both ensemble and variational methods are similar, but ensemble methods are more efficient in a parallel computing environment.

[8] In this paper, we apply a new ensemble-based method called the maximum likelihood ensemble filter (MLEF) [*Zupanski*, 2005; *Zupanski and Zupanski*, 2006] to global CO_{2} inversion. The MLEF has also been applied to regional CO_{2} inversion [*Zupanski et al.*, 2007a]. This regional-scale study was focused on estimating the biases for GPP and respiration in North America by assimilating continuous CO_{2} observations from the WLEF tall tower and the “ring of towers” in northern Wisconsin. They investigated the model performance with a wide range of ensemble sizes and found that a reasonable solution can be reached even with small ensembles by applying covariance localization. For very large ensemble sizes, localization was not essential. In this study, we introduce a pseudodata experiment to test the performance of the MLEF by assimilating currently available (flasks, continuous, and aircraft profiles) observation sites on the global domain. We allow net surface fluxes of CO_{2} to vary on an hourly basis, and solve for persistent multiplicative biases of each component flux in each model grid cell. We have assumed an idealized case in this experiment such that the biases stay constant throughout the year, corresponding to errors in slowly-varying biogeochemical or land-management parameters such as forest stand age, nitrogen deposition, or coarse woody debris which are difficult to simulate accurately everywhere. In reality, model biases may vary seasonally or in some other time scale, but these variations are not considered in the present study. The MLEF is feasible for applications with very large observation vectors (e.g., satellite observations) since no serial assimilation of observations is required and hence can be a useful tool in future CO_{2} studies. Serial processing of observations was introduced in ensemble square root filter schemes for the purposes of covariance localization [e.g., *Whitaker and Hamill*, 2002; *Peters et al.*, 2005]. In the MLEF, a different approach for covariance localization is taken, so serial processing of observations is not necessary.

[9] The remainder of this paper is organized as follows. In section 2, we describe the inversion scheme we used in this study. Section 3 presents the results along with a discussion based on a pseudodata experiment. Finally, section 4 includes the concluding remarks and future directions of our work.