Toward reliable ensemble Kalman filter estimates of CO2 fluxes



[1] The use of ensemble filters for estimating sources and sinks of carbon dioxide (CO2) is becoming increasingly common, because they provide a relatively computationally efficient framework for assimilating high-density observations of CO2. Their applicability for estimating fluxes at high-resolutions and the equivalence of their estimates to those from more traditional “batch” inversion methods have not been demonstrated, however. In this study, we introduce a Geostatistical Ensemble Square Root Filter (GEnSRF) as a prototypical filter and examine its performance using a synthetic data study over North America at a high spatial (1° × 1°) and temporal (3-hourly) resolution. The ensemble performance, both in terms of estimates and associated uncertainties, is benchmarked against a batch inverse modeling setup in order to isolate and quantify the degradation in the estimates due to the numerical approximations and parameter choices in the ensemble filter. The examined case studies demonstrate that adopting state-of-the-art covariance inflation and localization schemes is a necessary but not sufficient condition for ensuring good filter performance, as defined by its ability to yield reliable flux estimates and uncertainties across a range of resolutions. Observational density is found to be another critical factor for stabilizing the ensemble performance, which is attributed to the lack of a dynamical model for evolving the ensemble between assimilation times. This and other results point to key differences between the applicability of ensemble approaches to carbon cycle science relative to its use in meteorological applications where these tools were originally developed.

1. Introduction

[2] Over the last decade it has become increasingly apparent that quantification of global carbon sources and sinks with sufficient accuracy and precision is critical to balancing the global carbon budget and monitoring of carbon-management activities [Schimel, 2007]. It has also become clear that our understanding of, and ability to accurately model, the carbon-cycle is severely constrained by the sparse distribution of the present atmospheric CO2 measurement network [Scholes et al., 2009]. The sparse and spatially non-uniform network is neither sufficient to constrain regional budgets with the needed certainty, nor understand the nature, geographic distribution and temporal variability of CO2 sources and sinks. This absence of spatially and temporally dense measurements of atmospheric CO2has spurred the development of space-based measurement sensors. Measurements from passive sensors such as the Atmospheric Infrared Sounder (AIRS) on Aqua [Chahine et al., 2006], the Tropospheric Emissions Spectrometer (TES) on Aura [Kulawik et al., 2010], the Infrared Atmospheric Sounding Interferometer (IASI) on Met-Op-1 [Crevoisier et al., 2009], the SCanning Imaging Absorption spectroMeter for Atmospheric CartograpHY (SCIAMACHY) on EnviSAT [Buchwitz et al., 2005], the Greenhouse gases Observing SATellite (GOSAT) [Hamazaki et al., 2004], as well as planned future sensors such as the Orbiting Carbon Observatory-2 (OCO-2) [Crisp et al., 2008], and the Active Sensing of CO2 Emissions over Nights, Days, and Seasons (ASCENDS) satellite [National Research Council, 2007] are expected to improve our scientific understanding of regional carbon cycle processes and budgets. Although remote-sensing measurements of CO2 do not achieve the precision possible from in situ measurements [Rayner and O'Brien, 2001], they provide a large number of observations with near-global coverage, beyond what is possible from a surface network alone [e.g.,Buchwitz et al., 2007].

[3] The global coverage provided by these space-based measurements has demonstrated promise in improving the accuracy and precision of regionally resolved flux estimates [e.g.,Baker et al., 2010; Ciais et al., 2010], but the solution of the associated inverse problem has also resulted in a substantial increase in computational cost. The computational challenge results from the fact that inverse modeling techniques (a.k.a. top-down approaches) have historically been solved in “batch” mode, where the inversion is performed by solving a system of linear equations relating the CO2 fluxes and the atmospheric CO2 observations [e.g., Enting, 2002]. Solving the batch problem requires an atmospheric transport model to be run either once per estimated flux region/period combination, or once per observation if an adjoint to the transport model is available. This becomes computationally infeasible given the increasing spatial and temporal resolution at which CO2fluxes are being estimated, and the increasing number of concentration measurements available from remote-sensing observations.

[4] To address the increasing computational challenge of atmospheric inversions, data assimilation (DA) techniques (e.g., ensemble Kalman filter methods [Peters et al., 2005; Feng et al., 2009; Miyazaki et al., 2011]; variational methods [Rayner et al., 2005; Chevallier et al., 2005; Rödenbeck, 2005; Baker et al., 2006], or hybrid approaches such as the Maximum Likelihood Ensemble Filter [Zupanski et al., 2007; Lokupitiya et al., 2008]) have recently been employed for estimating CO2 fluxes, in some cases as part of advanced systems where meteorological and carbon variables are simultaneously assimilated [e.g., Kang et al., 2011]. Application of data assimilation techniques to the CO2 problem is, however, much less mature [Rayner, 2010] than its use in numerical weather prediction [Swinbank, 2010, and references therein] or for assimilating other atmospheric constituents such as humidity and ozone [e.g., Rood, 2005; Lahoz and Errera, 2010]. An important question for carbon-cycle research that has hitherto remained unanswered concerns the impact of the numerical data assimilation framework on the precision and accuracy of fine-scale flux estimates and their associated uncertainties. Previous CO2 DA studies have evaluated flux estimates by comparing them to biospheric model and inventory estimates and/or by assessing how well they reproduce available atmospheric CO2 observations [e.g., Peters et al., 2005; Chevallier et al., 2007; Lokupitiya et al., 2008; Feng et al., 2009; Baker et al., 2010; Kang et al., 2011; Miyazaki et al., 2011]. Given the host of error sources (e.g., transport error, aggregation error, etc.) that impact inversions, these diagnostics provide an assessment of the overall inversion framework, but do not isolate any errors incurred due to the numerical approximations in the implemented DA approach.

[5] This study is primarily motivated by an attempt to isolate and quantify such errors, specifically from the perspective of an ensemble Kalman filter applied to the estimation of CO2 fluxes at fine spatial and temporal scales. Ensemble filters and their variants [e.g., Peters et al., 2005, 2007, 2010; Zupanski et al., 2007; Lokupitiya et al., 2008; Feng et al., 2009, 2011; Kang et al., 2011; Miyazaki et al., 2011] have gained popularity within the carbon-cycle community due to their simple conceptual formulation and relative ease of implementation. So far, the examination of the use of ensemble Kalman filters for estimating fluxes at fine spatial and temporal scales has been limited, however. Except forKang et al. [2011] and Miyazaki et al. [2011], where fluxes were estimated at the grid resolution of the atmospheric transport model used in the studies (∼2.8°), almost all other studies have estimated fluxes at large spatial scales (e.g., continental or ecoregion). The temporal scales at which fluxes have been estimated range from several days to weeks.

[6] The work presented here estimates fluxes at substantially finer scales (1° by 1° and 3-hourly) relative to previous application of ensemble filters. In general, high resolution estimates of carbon fluxes are advantageous for 1) improving budgeting and mechanistic understanding of the carbon cycle at local to regional scales, and 2) minimizing spatial and temporal aggregation errors [e.g.,Kaminski et al., 2001; Peters et al., 2010; Gourdji et al., 2010] that may otherwise bias the final flux estimates. The impact of spatial aggregation errors has long been discussed and documented in the inverse modeling literature [e.g., Kaminski et al., 2001; Engelen et al., 2002; Peters et al., 2010], and recent studies have shown that a priori temporal aggregation has similar impacts. Gourdji et al. [2010, 2012]found that biases occurred when fluxes were estimated at multiday or even daily timescales, and recommended a 3-hourly temporal resolution to allow the inversion to resolve the diurnal cycle.Huntzinger et al. [2011] further found that differences between the diurnal representations among a suite of terrestrial ecosystem models yielded significant difference at CO2monitoring locations, suggesting that adopting a fixed diurnal cycle from one particular model a priori could bias flux estimates at larger scales. Although desirable from a scientific perspective, applying an ensemble approach to a fine-scale flux estimation problem is challenging due to two issues associated with the ensemble filter.

[7] The first challenge common to all applications of ensemble filters is the error due to representing the probability density function of the fluxes by a finite number of randomly generated flux realizations or system states. Experience in the NWP area has suggested that because of the finite number of ensemble members, the ensemble filter can suffer from variance underestimation, rank deficiency and sampling error [e.g., Houtekamer and Mitchell, 2005; Anderson, 2007a, 2007b; Ehrendorfer, 2007; Meng and Zhang, 2011], all of which impact both the final estimates and their uncertainty. Anderson [2007a]notes that, even in low-order perfect-model applications for NWP, mitigating the impacts of the limited ensemble size requires the introduction and tuning of several additional algorithms. Considerable expertise exists in these algorithms for NWP [e.g.,Hamill and Whitaker, 2005; Anderson, 2007a, 2007b; Uzunoglu et al., 2007; Sacher and Bartello, 2008; Anderson, 2009; Bergemann and Reich, 2010; Bishop and Hodyss, 2011] and for DA of other constituents [e.g., Schutgens et al., 2010]. However, these algorithms and their impact on flux estimates and uncertainties are less well-understood for the carbon flux estimation problem. Applications to CO2 have investigated the impact of ensemble size [e.g., Peters et al., 2005; Zupanski et al., 2007; Feng et al., 2009; Miyazaki et al., 2011] and different localization/inflation parameters (see Sections 2.3 and 2.4) but have refrained from drawing conclusions as to the optimal values of parameters that may aid future filter designs. Because previous studies have not compared ensemble filter estimates to those from batch inversions, it is difficult to isolate the impact of the parameter and algorithm choices from other errors present within any inversion framework.

[8] The second challenge that differentiates the CO2flux estimation problem from NWP-related applications is that there is currently no dynamical model to directly evolve the carbon flux state vector forward in time [Peters et al., 2005; Lokupitiya et al., 2008; Miyazaki et al., 2011]. In other words, given the estimated flux at one time, there is no model to predict the flux at the following assimilation time. The lack of such a dynamical model represents a loss of valuable information to the ensemble, as along with the transport model, a dynamical model would capture the flow-dependent error covariance patterns. In NWP-related applications, several studies have been carried out to test the impact of dynamical model errors [e.g.,Houtekamer et al., 2005; Szunyogh et al., 2005; Houtekamer et al., 2009; Hamill and Whitaker, 2011], but no study has evaluated the impact of a complete absence of a dynamical model. The absence of the dynamical model may make the ensemble filter extremely sensitive to the observation network and coverage. Given the spatial and temporal variability of atmospheric CO2measurements (whether in situ or satellite-based), this raises questions about the applicability of ensemble filters for leveraging the information content of available CO2 observations.

[9] In order to understand these issues, we introduce a geostatistical variant of the Bayesian ensemble square root filter (EnSRF) [Whitaker and Hamill, 2002]. The geostatistical ensemble square root filter (GEnSRF) is based on a geostatistical inverse modeling (GIM) formulation of the flux estimation problem [Michalak et al., 2004]. The GIM formulation is not limited to the use of prior CO2 flux information from biospheric models and/or inventories, and has been applied for inversions conducted at very high spatiotemporal resolutions [e.g., Gourdji et al., 2012].

[10] The GEnSRF is used as a prototype filter in exploring the impacts of the challenges outlined above for CO2flux estimates at fine spatial and temporal scales. The sensitivity of the ensemble filter to different scenarios is judged by comparing the GEnSRF estimates to the estimates from an equivalent batch GIM setup. This comparison is carried out using synthetic data from the growing season (June 2008) over North America. Both GEnSRF and GIM are used to estimate fluxes and their associated uncertainties at a 1° × 1° (spatial) and 3-hourly (temporal) resolution. Test cases are designed to gauge the performance of the ensemble system and to evaluate whether the numerically approximate ensemble scheme can accurately capture the characteristic features of the CO2 cycle, such as the spatial location of sources and sinks and the amplitude and phase of the diurnal flux cycle. The test cases are used to assess the baseline performance of the ensemble system, as well as to explore the impact of the measurement network, ensemble size, and the implementation of covariance inflation and localization algorithms designed to improve ensemble performance.

[11] Overall, this work provides (1) an assessment of the relative performance of the ensemble filter in comparison to the batch approach and of the conditions necessary for the ensemble approach to be a suitable replacement for batch inversions, and (2) an investigation of the error sources in the ensemble system and their implications for adjustments to ensemble systems that need to be made relative to NWP applications. The remainder of this paper is organized as follows. Section 2 provides the rationale for the proposed filter followed by an overview of GEnSRF. Section 3 provides a description of the examined synthetic data test cases. Results are presented and discussed in Section 4. Finally, we conclude in Section 5 with a summary of the findings of this study and recommendations for future research.

2. Methodology

2.1. Choosing a Filter Formulation

[12] The underlying framework in all ensemble filters is a low-rank ensemble representation of the error covariance matrices. The ensembles themselves are scaled matrix square-roots of the covariance matrices, and are updated during the assimilation of observations either stochastically [e.g.,Houtekamer and Mitchell, 1998; Burgers et al., 1998; Pham, 2001] or deterministically [e.g., Bishop et al., 2001; Anderson, 2001; Whitaker and Hamill, 2002; Ott et al., 2004]. The details of this update step distinguish most ensemble variants. Based on existing studies [e.g., Tippett et al., 2003; Lawson and Hansen, 2004; Nerger et al., 2005; O'Kane and Frederiksen, 2008], it can be concluded that for a linear problem – (1) deterministic filters are more accurate than their stochastic counterparts, and (2) although all the deterministic filters will produce analysis ensembles that span the same state subspace and have the same covariance, implementation of a serial EnSRF has the lowest computational cost, if the observation errors are assumed to be independent.

[13] The simplest serial EnSRF that can be implemented for inferring CO2 surface fluxes is one using a Bayesian formulation (e.g., CarbonTracker [Peters et al., 2010]), which uses prior information about the CO2fluxes from bottom-up models and/or inventories. Because of the highly ill-posed nature of the CO2 flux estimation problem and the sparseness of the current observing network, the posterior flux estimates and uncertainties are quite sensitive to the a priori prescribed flux patterns and their associated error covariance parameters [Peters et al., 2010]. By adapting the ensemble system to the geostatistical approach, we avoid some of the reliance on prior/model assumptions, albeit at the cost of an increase in complexity. Therefore, it can be argued that the niche filled by a geostatistical ensemble square root filter lies in more directly isolating the information content of the available atmospheric measurements.

2.2. Geostatistical Ensemble Square Root Filter

[14] The GEnSRF, like the EnSRF, is a Monte Carlo technique based on a state-space formulation of the Kalman Filter using an ensemble of model states to represent, propagate and update the estimates of the state and state error covariance. The aim is to minimize a cost function of the form:

display math

where z is a n × 1 vector of observations, h represents the atmospheric transport model, s is a m × 1 state vector composed of the discretized unknown surface flux distribution, R is the n × nmodel-data mismatch covariance,X is an m × 1 vectors of ones in the test cases presented in the work, but could also include auxiliary variables related to carbon flux [see Gourdji et al., 2008, 2012] for further details on the selection of auxiliary variables), β is an unknown constant here, but could also include unknown drift coefficients that scale the auxiliary variables in X, the prior covariance matrix Q describes the expected variability in flux departures from Xβ as a function of the separation distance in space and time between fluxes (see Section 3.1.3 for further details on the structure of Q), and T represents the transpose operator. In a batch setup, instead of running the transport model h directly as part of the inversion, an n × m sensitivity matrix H (a.k.a. Jacobian matrix) is generated that represents the sensitivity of the observations z to the fluxes s (i.e., Hi,j = ∂zi/∂sj).

[15] Equation (1) represents a compromise between reproducing the atmospheric measurements (z) and staying close to the statistical model of the trend (Xβ), where the covariance matrices determine the relative weight of these competing objectives. Although some implementations of ensemble approaches include more variable in the state vector s, including atmospheric concentrations of CO2 themselves, the focus here is on constraining only the underlying fluxes. Correspondingly, any updates in the atmospheric CO2 distribution must therefore be attributable to a change in the underlying fluxes.

[16] The GEnSRF is implemented as a smoother [e.g., Bruhwiler et al., 2005; Michalak, 2008], such that individual time steps through the smoother include (a) fluxes that are no longer being estimated, (b) fluxes that are being updated at least for the second time (i.e., that have been previously estimated), and (c) fluxes being estimated for the first time (i.e., for which no prior information is available). In the following discussion, the m × 1 vector of the estimated surface flux distribution is represented as inline image. inline image denotes estimates of fluxes that are being updated at least for the second time, inline image denotes fluxes that are being obtained for the first time, and inline image denotes both sets of fluxes being estimated. Finally, the superscripts a and b represents the analyzed (or updated) estimate and the previous (or background) estimate.

[17] Given an initial prior covariance Qb, GEnSRF starts by creating an ensemble of N state fields (where Nm). These are created as unconditional realizations of the matrix Qb through Cholesky decomposition.

display math

where sb represents the estimated error statistics of CO2 flux deviations from the trend. In the limit of N → ∞ this representation of Qb is exact. In GEnSRF, observations are assimilated serially. When the ith observation is being assimilated, the estimates of fluxes are given by:

display math

where Λ is calculated by solving the following system of equations,

display math

[18] Consistent with a GIM framework, fluxes being estimated for the first time ( inline image) need not be initialized with a prior value, and the zero in equation (3) is not equivalent to a prior in the classical Bayesian setup. Since direct matrix computation of inline image and inline image can be expensive, these are approximated by running the transport model directly with the ensemble of state deviations.

display math
display math

[19] Once Λ is obtained, it is used in equation (3) to estimate the fluxes. If the same Λ is used to update the ensemble of state deviations from the mean inline image, it would result in an underestimation of the analysis error covariance [Whitaker and Hamill, 2002]. Instead Λ is reduced in magnitude ( inline image; equation (7)) such that the spread of the ensemble is reduced less by the analysis (equation (8)), in order to maintain an error covariance consistent with the full-rank Kalman filter.

display math
display math

[20] When observations are serially processed, equation (7) reduces to the computation of a scalar factor. Notice that the piece inline image is already available, and hence updating the ensemble via equation (8) is no more computationally expensive than equation (3).

[21] Finally, before assimilating the next observation, we update the sampled observational ensemble and the sampled observation state corresponding to all future observations that are yet to be assimilated:

display math
display math

[22] Equations (9) and (10) require two additional transport model runs that could be avoided as described in Peters et al. [2005] by approximating these equations in a manner similar to equation (5). When the ensemble size is much smaller than the size of the state vector (as will be the case when fluxes are estimated at a high spatial and temporal resolution) this results in a poor approximation, however. The additional cost of running the transport model might well offset the errors incurred due to the approximation. However, with satellite measurements, running the transport model each time an observation is assimilated makes the direct implementation of equations (9) and (10) impractical. Hence, work is underway to find suitable alternatives to both these equations without incurring large errors in the analysis. In the work presented here, these additional runs are performed.

[23] Using equations (3) to (10), a best estimate of CO2 fluxes is obtained. Once all observations have been assimilated, the a posteriori covariance Qa for the flux estimates is reconstructed from the ensemble (equation (11)). The diagonal values of this posterior covariance matrix correspond to the uncertainty (expressed as a variance) of each estimated flux component in inline image.

display math

[24] Finally, in regular NWP applications a dynamical model (nonlinear forecast operator) would have been used to propagate the state vector between the two observational time periods. For the CO2 problem, no suitable deterministic model is available to directly propagate fluxes from one time step to the next. This differentiates the CO2 flux estimation problem from the NWP and other trace gas assimilation problems, and may have critical implications for good filter performance. Note that this drawback is not specific to GEnSRF but to all variants of the ensemble filter that have been employed for CO2 flux estimation.

2.3. Covariance Localization

[25] Covariance localization aims to heuristically improve the error covariance estimates in the case of small ensemble sizes. In all ensemble filters (including GEnSRF), the number N of ensemble members is small relative to the size mof the state-space, hence the representation of the prior covariance matrix inN-dimensional space is not perfect. This results in a number of erroneous flux correlations as a consequence of which a state variable may be incorrectly impacted by an observation that is physically remote.

[26] Several covariance localization techniques have been proposed for the NWP problem [Houtekamer and Mitchell, 2001; Hamill et al., 2001; Ott et al., 2004; Anderson, 2007b] to account for the statistical noise of the ensemble. For CO2 applications, implemented localization schemes have varied depending on the particular ensemble filter variant being used. For example, Peters et al. [2005] chose a simple exponential decay function, while Miyazaki et al. [2011] subjectively specify different cutoff radii based on the type and location of observation data used in their analysis. Zupanski et al. [2007] and Lokupitiya et al. [2008] chose a more dynamic scheme based on information theory, where the localization length scale is a function of the information content in the assimilated observations.

[27] Similarly to Peters et al. [2005], we implement a simple covariance localization scheme in GEnSRF. This is achieved by performing a Schur (or Hadamard [Horn and Mathias, 1990]) product, or element-wise multiplication (denoted • inequation (12)) of a correlation matrix ρ with the covariance model generated by the ensemble as shown in equation (12).

display math

[28] Here, ρis defined using a standard fifth-order Gaspari-Cohn function [Gaspari and Cohn, 1999] with a finite length scale. Note that Peters et al. [2005] used an exponential decay function to define their ρ.Both the Gaspari-Cohn function and the exponential function are compactly supported [Gneiting, 2002; Bergemann and Reich, 2010], which means that the function is nonzero in only a small (local) region specified by a length scale. We find that the overall conclusions presented in Section 4.2 are valid for a variety of compactly supported functions. The key ingredient in all compactly supported functions is the length scale, which ensures that spurious correlations are removed, but correctly specified physical correlations are maintained and not excessively damped.

[29] Covariance localization using the Schur product might be a simple approach to increase the effective rank of the covariance matrix, but there are several important caveats for CO2 applications. Previous studies [e.g., Lokupitiya et al., 2008] have raised questions regarding selection of an appropriate localization length scale, and whether the atmospheric advection of CO2is consistent with the use of a compactly supported correlation function such as the Gaspari-Cohn. Since the prior covariance matrix holds information on the spatial and temporal autocorrelation of flux deviations from the trend (Section 3.1.3), by including the Schur product and thereby modifying this matrix, covariance localization may disrupt the autocorrelation structure (see Karspeck and Anderson [2007], Oke et al. [2007], and Kepert [2009]for a similar discussion related to NWP problems). In spite of these concerns, in this study, we have persisted with the Gaspari-Cohn function because we want to assess the applicability of this simple scheme for atmospheric CO2 inversions. Future work could explore the applicability of more sophisticated dynamic localization schemes [e.g., Zupanski et al., 2007] or balance-aware localization schemes proposed for NWP [e.g.,Bishop and Hodyss, 2011; Kepert, 2011; Jun et al., 2011] or more adaptive techniques based on the prior ensemble [e.g., Anderson, 2012].

2.4. Adaptive Covariance Inflation

[30] A second algorithm for combatting insufficient variance in ensemble filters is covariance inflation. Insufficient variance (or undersampling) is primarily caused by sampling error resulting from the use of small ensembles [Furrer and Bengtsson, 2007]. Over successive assimilation periods, undersampling can become more severe, and in the worst case can lead to filter divergence, where the filter effectively rejects the observations and the assimilation reduces to the prior. Also, because the posterior analysis error covariance is generated from the ensemble at the end of the assimilation (equation (11)), insufficient variance leads to an under-estimation of the analysis error covariance (i.e., the flux uncertainties).

[31] Several ad hoc and adaptive techniques have been proposed in NWP applications to counter this loss of variance [e.g., Anderson and Anderson, 1999; Zhang et al., 2004; Hamill and Whitaker, 2005; Anderson, 2007a; Li et al., 2009; Anderson, 2009; Peña et al., 2010]. Various inflation schemes have also been employed with CO2 applications, depending on the variant of ensemble Kalman filter used in a particular study. Some have avoided using the ensemble spread as a measure of uncertainty altogether by instead deriving final uncertainties from a set of sensitivity experiments [e.g., Peters et al., 2010]. Feng et al. [2009] chose to use an ensemble of the same size as the state vector, thereby minimizing undersampling directly. For the maximum likelihood ensemble filter [e.g., Zupanski et al., 2007; Lokupitiya et al., 2008] a multiplicative inflation scheme was used for covariance inflation, where the ensemble is inflated by a constant factor that is homogeneous in space (although different inflation factors are used for land and ocean regions) and time. This approach has some limitations because neither the observation network nor the CO2 dynamics are homogeneous in space and time, and the cost of tuning experiments to find an appropriate inflation factor that is applicable everywhere is prohibitive. Recognizing these drawbacks, more recent studies have employed either conditional covariance inflation [Miyazaki et al., 2011] or a mix of adaptive and covariance relaxation techniques [Kang et al., 2011].

[32] In GEnSRF, we adopt the more generalized version of the adaptive technique used by Kang et al. [2011] (as originally proposed by Anderson [2009]) to calculate spatially and temporally varying inflation factors for each state component (i.e., flux at each time and grid point). This adaptive algorithm applies Bayesian estimation theory to the probability density function of the inflation factors. First, a normally distributed inflation random variable is associated with each element of the state vector. Then, via Bayes theorem, these inflation factors are incrementally updated during serial assimilation of the observations. Note that the atmospheric CO2 observations can be used to optimize the inflation factors for the CO2 fluxes due to the link between these quantities provided by the atmospheric transport model.

[33] In order to calculate the spatially and temporally varying inflation factors, however, it is necessary to implement covariance localization first. The adaptive technique uses sample correlations of the ensemble between observation space and the model space to convert the inflation estimates in the observation space to those in the model space. Covariance localization plays an important role in reducing the sampling noise in the sample correlations. If no covariance localization is pursued, then the sampling error manifests itself in the adaptive inflation step resulting in spurious inflation factors. Thus, using the adaptive technique, we have specifically adjusted the covariance inflation strategy to take into account the information provided by the atmospheric CO2 measurements.

[34] We refer the reader to Anderson [2009] (and the subsequent review by Miyoshi [2011]) for the mathematical underpinnings of the adaptive approach. It is worthwhile to reiterate that this particular adaptive technique has not previously been applied to any CO2 inversion study. Hence, as part of the sensitivity tests described later, we will examine both the advantages and disadvantages of this adaptive technique for the CO2source-sink estimation problem.

3. Sample Application

[35] The GEnSRF approach is applied to a synthetic data study over the North American continent (Section 3.1). A series of analyses are designed (Section 3.2) to compare the estimates from GEnSRF with the estimates from GIM. These comparisons are done by aggregating the posterior estimates to a range of spatial and temporal scales (Section 3.3) to assess the accuracy and precision of the ensemble approach relative to a batch inversion.

[36] In the context of this study, the main advantage of the GIM approach relative to the GEnSRF technique is that it solves the entire system of equations analytically (without any approximations) and hence provides a “gold” standard for evaluating the ensemble results. By keeping the atmospheric data sets consistent for GEnSRF and GIM, it is possible to isolate the degradation due to the numerical approximations in the ensemble filter framework. The drawback of this setup is that the effects of the transport model errors have been removed by using the same transport model to both create the synthetic measurements as well as estimate the fluxes in the inversion.

3.1. Experimental Design

3.1.1. Flux Data and Basis Functions

[37] Biospheric fluxes from the Carnegie Ames Stanford Approach terrestrial carbon cycle model, as configured for the Global Fire Emissions Database v2 project (henceforth referred to as CASA-GFEDv2) [Randerson et al., 1997; van der Werf et al., 2006] are used as the true fluxes for generating the synthetic atmospheric data. The monthly averaged 1° × 1°CASA-GFEDv2 Net Ecosystem Exchange (NEE) for June 2008 (Figure 1) is temporally downscaled to 3-hourly resolution using the method ofOlsen and Randerson [2004].

Figure 1.

“True” CASA-GFEDv2 fluxes aggregated to the monthly scale.

[38] The sensitivity matrix H is obtained by coupling the Weather Research Forecasting (WRF) model [Skamarock et al., 2005] to the Stochastic Time-Inverted Lagrangian Transport Model (STILT) [Lin et al., 2003], as outlined in Gourdji et al. [2010]. Calculating and pre-storing the sensitivity matrixH is necessary for performing the batch GIM analysis but not for GEnSRF, where the transport model can be run directly as part of the DA system. Given that H was available in this case, it is also used as the transport model for the ensemble implementation.

3.1.2. Synthetic Observation Data

[39] The basis functions generated via WRF-STILT are used with the CASA-GFEDv2 fluxes to generate the synthetic observationsz (i.e., h(s)) for the 35 continuous observation towers (see Table S1 in auxiliary material Text S1) that were operational in June 2008 (Figure 2). First, a full set of synthetic data is generated for all the towers at the 3-hourly scale, and small random errors (standard deviation of 0.1 ppm) are then added to the synthetic data. Such small errors were used to represent an, albeit somewhat unrealistic, best case scenario for the performance of the ensemble approach. Next only afternoon measurements are retained for the shorter towers (height ≤ 150 m) consistent with typical data choices in inversion studies [e.g.,Göckede et al., 2010; Gourdji et al., 2010], motivated by lower transport model errors for afternoon conditions [e.g., Geels et al., 2007; Gerbig et al., 2008]. Finally, data gaps are simulated in the synthetic observations consistent with missing data from the actual June 2008 observations (due to either instrument down time or calibration needs). By mimicking the random outage in the collection/storage of the data, the synthetic data set is highly variable (in both space and time), but realistic. The ratio of the number of fluxes (m) to the number of observations (n) is on the order of ∼250:1. Conversely, if the full set (i.e., eight 3-hourly averaged observations per day) of synthetic observations without data gaps were retained, then the ratio would be on the order of ∼75:1.

Figure 2.

Location of the 35 tower network (stars), and the regions used for interpreting the flux estimates, i.e., North America and the MCI region (green shaded area). The background grid represents the flux estimation resolution of 1° × 1°. The three-letter codes for the towers are defined in Table S1 inauxiliary material Text S1.

3.1.3. Error Covariance Matrices

[40] The model-data mismatch covariance matrixR is a diagonal matrix, with values of 0.01 ppm2along the diagonal (i.e., all towers are assumed to have the same model-data mismatch error), corresponding to the variance of the errors introduced into the synthetic observations.

[41] The prior covariance matrix Qb captures the spatiotemporal autocorrelation of the flux deviations from the model of the trend Xβ. In this study, only spatial correlation is assumed a priori in order to keep the structure of Qb simple, although accounting for both spatial and temporal correlation could further improve estimates [e.g., Gourdji et al., 2010; Chevallier et al., 2012].

[42] Qbis prescribed as a block diagonal matrix, with each block describing the correlation between grid-scale fluxes for each time period of the inversion. Based on previous work [e.g.,Michalak et al., 2004; Gourdji et al., 2010], each block is modeled by an exponential covariance function:

display math

where d is the spatial separation distance between the grid points where fluxes are to be estimated, σ2 represents the variance of the flux residuals at large separation distances, and l is the range parameter. The correlation length beyond which correlation between the flux residuals becomes negligible is approximately 3l [Chilès and Delfiner, 1999] for an exponential model.

[43] The covariance parameters in Qb (i.e., σ2 and l) can be obtained via different methods [e.g., Michalak et al., 2005; Rödenbeck, 2005; Chevallier et al., 2010] ranging from analyzing the variability in biospheric model outputs to statistically inferring these parameters directly from the atmospheric measurements. In this study, we follow the latter approach, and optimize for the covariance parameters using the Restricted Maximum Likelihood [e.g., Kitanidis, 1995; Michalak et al., 2004] approach.

3.2. Test Cases

[44] Two primary inversion setups (TC1 and TC2) are outlined, both of which estimate 3-hourly fluxes at a 1° × 1° scale over North America for the month of June 2008. However, TC1 uses a sparse measurement data set (as described inSection 3.1.2), while TC2 uses all 24 h of measurements for all towers, yielding a temporally denser and homogeneous data set. Covariance parameters were estimated separately for the two test cases. GIM is run once for each test case to obtain the batch estimates, while the GEnSRF is run multiple times for both TC1 and TC2 with different configurations of ensemble size, localization and inflation parameters. The details of these runs are expanded upon in the following paragraphs.

[45] A control run of GEnSRF is defined based on TC1 with a 500-member ensemble and without any covariance localization or adaptive inflation. This run, designated as TC1E500, is used to gauge the incremental benefits of subsequent modifications.

[46] Given the absence of a dynamical model to propagate the state vector forward in time, our hypothesis is that the inversion conditions (at least in terms of measurement quantity and density) may play a significant role in the performance of the ensemble filter. Miyazaki et al. [2011]concluded that the absence of a dynamical model resulted in the posterior analysis being sensitive to the initial error covariance, but this earlier study did not test the influence of the measurement network. TC2 is designed to explore the impact of the measurement network sparseness as it represents the best possible scenario that one can attain with the existing ground-based continuous measurement network. If this were a real-data study, there would be several caveats regarding using all 3-hourly measurements, especially from shorter towers [Gourdji et al., 2012]. Hence other inversion scenarios in which measurements are progressively reduced in space and time were also evaluated. The conclusions from these additional test cases mirrored those from TC1 and TC2, and hence these have been omitted here for the sake of brevity.

[47] In order to provide insight into an optimal and practical setup of the ensemble filter that can provide accurate flux estimates of CO2with reliable uncertainties, the parameters of the ensemble system were varied for both TC1 and TC2. The GEnSRF is run with three ensemble sizes – 100, 500, and 2500, denoted as E100, E500, and E2500, respectively. In addition, three different localization length scales were prescribed – 500, 1500 and 3000 km, denoted as L500, L1500 and L3000, respectively. Finally, the adaptive inflation algorithm requires a priori estimates of inflation factors and their associated variance. Again three different specifications of the prior inflation variance were provided - a prior inflation factor of 1 with a standard deviation of 0.01, a prior inflation factor of 1 with a standard deviation of 0.05, and a prior inflation factor of 1 with a standard deviation of 0.25. These runs are denoted as I001, I005 and I025, respectively. These parameters were chosen based on a combination of extensive literature review of ensemble filter applications, subjective knowledge of CO2transport and its correlation scales, and some preliminary testing with a 1D advection-diffusion problem.

[48] Overall, a total of 2 GIM and 56 GEnSRF runs were carried out for this study. The 2 GIM runs represent the batch estimates for each of the test cases, and are simply denoted as GIM TC1 and GIM TC2. For GEnSRF, the first run for each setup is with an ensemble size of 500 and without any localization and inflation applied (i.e., the control run GEnSRF TC1E500 and GEnSRF TC2E500). GEnSRF is then run with varying ensemble sizes, localization length scales, and different a priori inflation values as described above. As an example, the GEnSRF run for setup TC1 using a 500 member ensemble, a localization length scale of 1500 km and a prior inflation factor of 1 with a standard deviation of 0.05 is denoted GEnSRF TC1E500_L1500I005.

[49] Finally, as mentioned previously, the covariance parameters in Qb (equation (13)) are optimized separately for the two inversion setups. The flux standard deviation (σ) was 6.7 μmol/(m2 s) for TC1 and 6.4 μmol/(m2 s) for TC2, while the correlation length (3l) was 1630 km for TC1 and 1590 km for TC2. For all the runs, GEnSRF was spun up for 8 days prior to June 1, 2008. The lag window for the smoother was set to 10 days to take into account that CO2information is preserved over the continent for a maximum of 10 days. Note that a much longer lag window would have been required for global applications where there is no finite residence time for an air mass in the domain, or if substantial flux temporal correlation had been assumed a priori. Even with a 10-day window longer integrations of the transport model are required as well as more parameters need to be estimated. Thus to represent the covariance matrix properly, it becomes necessary to have a large number of ensemble members.

3.3. Evaluating the Analysis

[50] The posterior flux estimates from the different GEnSRF runs and GIM are compared using both quantitative and qualitative metrics at different spatial and temporal scales. Results are presented for a subset of GEnSRF runs (Table 1) that answer the specific questions posed in the study, and other setups are discussed where appropriate.

Table 1. Summary of the GEnSRF Configurations Reported in Section 4
Test Case NameInversion SetupNumber of ObservationsParameters
Ensemble SizeLocalization LengthPrior Inflation Standard Deviation
  • a

    Three-hourly implies observations are available at 8 time periods during the day.

  • b

    For the shorter towers (height ≤ 150 m) only afternoon measurements are used; for the very short towers (height ≤ 30 m) only those measurements recommended by the data providers are used.

GEnSRF TC1E500TC13-hourly with data gapsa,b500n/an/a
GEnSRF TC1E2500TC13-hourly with data gapsa,b2500n/an/a
GEnSRF TC1E500_L1500I005TC13-hourly with data gapsa,b5001500 km0.05
GEnSRF TC2E500TC23-hourlya500n/an/a
GEnSRF TC2E500_L1500I005TC23-hourlya5001500 km0.05

[51] In terms of time-averaged diagnostics, the two quantitative metrics used are the root mean square difference (RMSD) and the correlation coefficients (CC). The GEnSRF and the GIM flux estimates are aggregated to a monthly timescale and the RMSD and the CC calculated at the native 1° × 1° spatial resolution for all grid-cells across the continent. Both these quantities are reported aggregated over North America (NA) and the Mid-Continent Intensive (MCI) region. The MCI region [e.g.,Lauvaux et al., 2012] that is shown as the green shaded area in Figure 2, was not only well constrained by a dense measurement network in 2008, but also lies in the interior of the study domain and hence is immune to biases that may arise along the boundaries of the study domain [Dirren et al., 2007]. The monthly fluxes and uncertainties are also aggregated to seven ecoregions (Figure 3) that are loosely defined based on the work of Olson et al. [2001] and demarcate large (mostly contiguous) regions with similar climate, land cover and land use.

Figure 3.

Ecoregion map, modified from Olson et al. [2001], which is used for analyzing inversion results at spatially aggregated scales. Stars represent the location of the 35 tower network.

[52] In terms of diagnostics at fine time scales, the GEnSRF performance is evaluated at 3-hourly and daily time scales, aggregated spatially to the full North American domain. By domain-averaging the recovered 3-hourly fluxes, we assess the ability of GEnSRF to accurately recover the diurnal cycle of the CO2fluxes. Daily RMSD between the GEnSRF and the GIM grid-scale flux estimates are also examined as a function of time to evaluate the filter stability.

[53] The degree to which the GEnSRF fluxes reproduced the atmospheric CO2 observations was also evaluated (results not shown), but a direct comparison of GEnSRF fluxes rather than atmospheric concentrations provides a more direct measure of the impact of the numerical DA scheme.

4. Results

4.1. Multiscale Evaluation of the Ensemble Estimates for the Control Run (TC1E500)

[54] Monthly averaged grid-scale flux estimates and uncertainties for TC1 are presented inFigure 4. Qualitatively, it is clear that the control run (GEnSRF TC1E500, Figures 4c and 4d) is not capable of reproducing the monthly averaged GIM estimates or their associated uncertainties (Figures 4a and 4b). The under-estimated uncertainties should not be interpreted as more confident estimates, but rather point to the problem of insufficient variance in the ensemble. While the ensemble approach correctly captures the flux estimates over the Eastern corridor and the Southern parts of the continent, its performance degrades over the Northwestern region in Alaska and Canada, where scattered sources are incorrectly inferred throughout. In real-data application, this could be caused by two reasons: (1) the use of a limited number of ensemble members resulting in large sampling error, and (2) in general, this area has a sparse network with several of the available towers located in complex terrains where the transport is difficult to model. Given that this synthetic data study does not include transport model errors, the erroneous fluxes suggested by GEnSRF are a product of spurious ensemble noise. As indicated inTable 2, the difference in the spatial patterns between the two sets of estimates manifests itself in low CC and high RMSD between GIM and GEnSRF over North America.

Figure 4.

TC1 (top) flux estimates and (bottom) associated uncertainties aggregated to the monthly scale for (a and b) GIM and (c–h) three different GEnSRF runs.

Table 2. Correlation Coefficients (CC) and Root Mean Square Difference (RMSD; μmol/(m2s), Calculated Based on Grid-Scale, Monthly Averaged Flux Estimates Between the Various Runs of GEnSRF and GIM, for TC1 and TC2a
Test CaseNorth America (NA)Mid-Continent Intensive (MCI)
  • a

    Control runs for TC1 and TC2.

GEnSRF TC1E500a0.640.520.770.32
GEnSRF TC1E25000.810.350.910.29
GEnSRF TC1E500_L1500I0050.750.370.830.35
GEnSRF TC2E5000.680.480.770.30
GEnSRF TC2E500_L1500I0050.760.390.850.32

[55] Monthly averaged ecoregion-scale flux estimates and associated uncertainties are presented inFigure 5. GEnSRF TC1E500 estimates suggest a smaller sink throughout all ecoregions relative to the GIM TC1 estimates, and the 95% uncertainty bounds based on the ensemble estimate only capture the true fluxes in 4 of the 7 ecoregions. At the continental scale, the GEnSRF TC1E500 estimate (−23.8 (±3.4) gC/(m2month)) is significantly higher than the GIM TC1 estimate (−32.8 (±2.7) gC/(m2month)), and unlike the GIM estimate, does not capture the true flux of −30.46 gC/(m2month).

Figure 5.

Estimated monthly averaged flux estimates and the associated uncertainties aggregated to ecoregions (Figure 3) and continental scales. The error bars represent 95% uncertainty bounds.

[56] The inferred monthly averaged diurnal cycle at the continental scale is shown in Figure 6. GEnSRF TC1E500 does not reproduce the GIM TC1 diurnal pattern, with the difference between the two estimates spiking around 0400 h and 1900 h UTC. The largest differences between the estimated diurnal cycles coincide with times with the greatest temporal gradient in the true underlying fluxes, as well as times when observation locations are coming into/out-of the TC1 network. Another mechanism that could cause these observed errors is the sampling error due to a small ensemble size which could result in spurious temporal correlations in the estimates, leading to a dampened diurnal cycle relative to GIM. Further analysis (Sections 4.2 and 4.3) suggests that the gradient in the true diurnal cycle is the better explanatory factor. Conclusions based on the inferred diurnal cycle for the MCI region, which spans a much narrower longitudinal range and therefore exhibits less smearing of the diurnal cycle, are consistent with those for the full continent (Figure S1 in auxiliary material).

Figure 6.

(top) Estimated flux diurnal cycle, and (bottom) absolute errors of the individual GEnSRF estimates with respect to the GIM estimates, aggregated to the continental scale. Also highlighted in Figure 6 (bottom) is the average observation density (light yellow denotes <10 observations, medium yellow denotes ≥10 observations) used in TC1 over the day.

[57] Overall, the conclusion from the control run is that the small ensemble size and limited observational information in TC1E500 hinder the ensemble filter's ability to reproduce GIM estimates across spatial and temporal scales. Sampling errors and sparse measurements may both result in a dramatic failure of the ensemble filter to infer fluxes.

4.2. Sensitivity to Ensemble Size, and Covariance Localization and Inflation Algorithms

[58] A straightforward solution to reducing the sampling error is to increase the ensemble size, which in effect increases the rank of the ensemble estimate of the prior error covariance matrix. In the absence of a dynamical model and at the limit of a large ensemble, the ensemble filter asymptotically approaches the Kalman filter (assuming the error characteristics remain Gaussian) at a convergence rate of 1/√N. A large ensemble (GEnSRF TC1E2500, Figures 4g and 4h) indeed appropriately reduces the spurious noise in the best estimates at fine spatial scales, and yields uncertainty estimates close to those from GIM TC1, albeit at the expense of an increase in computational cost compared to GEnSRF TC1E500 proportional to the increase in the size of the ensemble.

[59] An alternate approach that does not carry substantial additional computational cost is to implement covariance localization and inflation, which dampen the sampling error and improve estimates of the flux uncertainties, as seen in GEnSRF TC1E500_L1500I005 (Figures 4e and 4f). The improved performance resulting from increasing the ensemble size and implementing localization and inflation is confirmed in Table 2, where GEnSRF TC1E2500 and GEnSRF TC1E500_L1500I005 both show higher CC and lower RMSD values relative to the control run. Both approaches also improve the ecoregion and continental scale results (Figure 5). The continental scale flux estimate for both GEnSRF TC1E2500 (−28.6 (±3.9) gC/(m2month)) and GEnSRF TC1E500_L1500I005 (−33.3 (±3.7) gC/(m2month)) capture the true CASA fluxes within their 95% confidence intervals.

[60] The impact of increasing the ensemble size or of implementing inflation and localization is less conclusive for the estimation of the diurnal cycle either over the full continent (Figure 6) or over the MCI region (Figure S1 in auxiliary material). From Figure 6, GEnSRF TC1E2500 captures the diurnal cycle very well initially but the error peaks around 1600 h UTC. The implementation of inflation and localization in GEnSRF TC1E500_L1500I005 does not yield a clear reduction in errors especially at 1900 h UTC, although the discontinuity observed in GEnSRF TC1E500 between 1300 h UTC and 1600 h UTC is eliminated. The overall diurnal cycle, however, becomes even more washed out and fails to capture the true amplitude of the fluxes. The lack of error reduction resulting from the implementation of inflation and localization highlights the fact that although sampling error does contribute somewhat to the errors in the estimated diurnal cycle, the dominant cause is either the variable measurement network prescribed in TC1 or the inability of a small ensemble to capture sharp gradients in the flux diurnal cycle, or a combination of both. This is problematic, especially if we were to use these estimates either for mechanistic understanding of the carbon cycle at sub-diurnal scales, or for reconciling with estimates from biospheric models.

[61] Changing the localization length scale to either 500 km or 3000 km for the 500-member ensemble negatively impacts the estimates (Table 3). A tight isotropic localization scale (GEnSRF TC1E500_L500) imposes high locality, as a consequence of which the autocorrelation information modeled in the prior covariance is completely lost. Measurements impact fluxes in their immediate vicinity, while areas in which no local observations are available are not constrained at all. It is possible though that strong localization could be imposed if a wide network of measurements were available to compensate for the loss of remote influence. On the other hand, a large localization scale (GEnSRF TC1E500_L3000) cannot significantly reduce the spurious correlations among distant flux locations. This suggests that the optimal value of the localization length scale (1500 km) may be linked with the correlation length scale of the fluxes themselves (∼1600 km, see Section 3.2). However, tests also revealed that the optimal filter length scale is a function of the size of the ensemble, with a smaller ensemble size requiring a shorter optimal length scale. This is due to the fact that if the number of ensemble members is large, the noise in the covariance estimates does not overwhelm the signal until much farther from the observations. This makes it harder to identify a universal mathematical or physical basis for selecting these length scales. Nevertheless, the correlation length scale of the fluxes can be used as a starting point for the localization length scale in future filter designs.

Table 3. Correlation Coefficients (CC) and Root Mean Square Difference (RMSD; μmol/(m2s), Calculated Based on Grid-Scale, Monthly Averaged Flux Estimates Between the Different Runs of GEnSRF and GIM for TC1
Test CaseNorth America (NA)Mid-Continent Intensive (MCI)
  • a

    Control run.

  • b

    Cases that specifically show the impact of localization on the final estimate.

  • c

    Cases that specifically show the impact of adaptive inflation on the final estimate.

GEnSRF TC1E500a0.640.520.770.32
GEnSRF TC1E500_L500b0.620.420.900.31
GEnSRF TC1E500_L1500b0.750.370.840.32
GEnSRF TC1E500_L3000b0.690.440.820.28
GEnSRF TC1E500_L1500I001c0.750.370.840.33
GEnSRF TC1E500_L1500I005c0.750.370.830.35
GEnSRF TC1E500_L1500I025c0.750.380.790.44

[62] Likewise, the estimates are found to be sensitive to the parameters of the adaptive inflation technique, especially in terms of the recovered uncertainties over data-sparse regions and periods. As evident inTable 3, the change in CC and RMSD is small for the different starting parameters of the adaptive inflation, but the impact is more visible when the uncertainties associated with the GEnSRF are compared to the uncertainties from the GIM. For example, with very tightly constrained inflation factors (I001), GEnSRF underestimates the standard deviation of the individual flux estimates by an average of 13% relative to GIM. Conversely, with very loose prior inflation factors (I025) the initial inflation in the ensemble is large. During assimilation of subsequent observations, the ensemble should be deflated gradually. Yet for TC1, even after the full analysis, the ensemble remains over-inflated, resulting in an overestimate of the posterior standard deviations by GEnSRF by 31%. A prior inflation factor standard deviation of 0.05 provides a good balance, with uncertainties being underestimated by GEnSRF by only 4% (Figure 4f).

[63] In understanding the response of the adaptive inflation technique, two factors need to be considered: 1) the specification of a large and spatially uniform prior inflation factor uncertainty, i.e., one that does not vary between data sparse and data dense regions, and 2) a delayed response on the part of the adaptive inflation technique in adjusting to the changes in the measurement network as specific measurement location come into and out of the network throughout the day. Recall that the adaptive inflation technique is based on a Bayesian inverse modeling framework; hence, its dependency on the measurement network is not surprising. Significant improvement in the performance is obtained if the inflation is damped toward 1 as a function of time. Damping the inflation value over time makes the technique less dependent on the measurement coverage, and has been successfully implemented in other operational tests of the adaptive technique [e.g., Torn, 2010].

4.3. Sensitivity to the Measurement Network

[64] For any inversion framework based on Bayesian estimation theory, the addition of measurements in space and time will improve both the estimation accuracy and the uncertainty. As expected by increasing the temporal density of measurements in TC2, the performance of both GIM and GEnSRF estimates at the grid and ecoregion scale improve significantly (results not shown). The continental scale flux estimates for both GIM TC2 (−31.7 (±1.9) gC/(m2 month)) and GEnSRF TC2E500_L1500I005 (−32.1 (±2.7) gC/(m2 month)) improve substantially, allowing them to capture the true CASA flux estimate (−30.5 gC/(m2 month)) within their 95% confidence intervals.

[65] Of greater interest is that the GEnSRF estimates now capture the amplitude and phase of the diurnal cycle better than in the case of TC1 (Figure 7 over the full continent and Figure S2 in auxiliary material over the MCI region), even without needing to increase the ensemble size. From Figure 7, GEnSRF TC2E500_L1500I005 estimates mirror the GIM TC2 3-hourly estimates, indicating the positive impact that the additional measurements have had on the ensemble filter, especially between 0100 h to 1600 h UTC. Comparing the error inFigure 7(bottom), one can see that the denser homogeneous network in TC2 plays a significant role in aiding the ensemble filter to correctly capture the diurnal cycle. However, the higher errors at 1900 h UTC still persist, showing that these errors are more likely to be attributable to the sharp gradient in the true diurnal cycle at this time, rather than due to temporal heterogeneity in the measurement network. Either hypothesis could have been supported by results from TC1, because transition times in network size coincide with times with sharp gradients in the diurnal cycle. Allowing the GEnSRF to directly estimate sub-continental spatial and sub-daily temporal patterns therefore also made it possible to identify the filter sensitivity to the measurement network prescribed in TC1 and TC2.

Figure 7.

(top) Estimated flux diurnal cycle, and (bottom) absolute errors of GEnSRF TC1E500_L1500I005 and GEnSRF TC2E500_L1500I005 with respect to the corresponding GIM estimates, aggregated to the continental scale. For TC2 the average observation density is 35 (dark yellow) but for TC1 it varies (light yellow denotes <10 observations, medium yellow denotes ≥10 observations) over the day.

[66] Results from TC2 also confirm that ensemble filter performance improves with a denser measurement network. This follows from the hypothesis stated earlier in Section 3.2, that for an under-determined inversion problem, the ensemble system is sensitive to the spatiotemporal density of the measurements. Additional runs with a temporally homogeneous 10 tower network confirmed that the total number of observations is a better determinant of ensemble performance at fine temporal scales relative to their temporal heterogeneity/homogeneity. Without the guidance of a dynamical model and in the absence of a rich observational constraint, the ensemble deviates from the truth, resulting in increased ensemble degeneracy and inaccurate estimates. In fact, as shown inFigure 8, it is only in the case of TC2 that the filter is stable and reaches an asymptotic level of accuracy.

Figure 8.

Time series of the Root Mean Square Difference (RMSD) between grid-scale daily averaged estimates from GEnSRF and GIM over North America for TC1 and TC2. The time series shown here is for the latter half of the assimilation cycle to emphasize that with the measurement network in TC1, the ensemble filter does not stabilize and suffers from divergence.

[67] The influence of the measurement density on the ensemble behavior can be examined using several diagnostics that are commonly available in the NWP literature. All of these diagnostics, however, require knowledge of the true state against which the ensemble mean is evaluated. In this study, the true state is available from the CASA-GFED v2 fluxes, but in real application this would be unknown and hence these diagnostics could not be calculated. The diagnostic selected here examines the ratio of the time-averaged ensemble spread to the error in the ensemble mean [Liu et al., 2008] at every estimation grid point, and highlights how measurement availability controls ensemble behavior. In this case, the ensemble spread is obtained as the difference between individual ensemble members and the ensemble mean, while the error in the ensemble mean is calculated as the mean squared difference from the true state. This ratio is an indication of the optimality of the DA system, and illustrates the impact of the measurements in adjusting this ratio. Figure 9 shows this ratio for GEnSRF TC1E500_L1500I005 and GEnSRF TC2E500_L1500I005. In the case of GEnSRF TC2E500_L1500I005, the ratio is close to 1.0 over most of the continent, such that on an average the analysis spread among the ensemble members is consistent with the true errors, i.e., mean squared difference between the ensemble mean and the truth. In the case of GEnSRF TC1E500_L1500I005, however, the ratio is close to 1.0 only over a small portion of the continent that is both removed from domain boundaries and that is relatively well constrained by observations, while in other areas the value of the ratio is near 2.0. This indicates that the ensemble overestimates the uncertainties in these regions by a factor of 2.

Figure 9.

The ratio of grid-scale time-averaged ensemble spread and ensemble mean error for TC1E500_L1500I005 and TC2E500_L1500I005. A ratio of 1.0 (or green color) indicates optimal data assimilation.

[68] This result is indicative of the better performance of the adaptive inflation technique with a richer observational constraint, as seen by the fact that TC2E500_L1500I005 has a reduced mismatch between the ensemble mean error and the ensemble spread, except over very sparsely observed areas like the Tundra. The dependency of the adaptive inflation technique on the spatial heterogeneity of the measurement network might seem a disadvantage at first. However, we argue that the adaptive algorithm provides inflation values that are preferable than having to manually tune the system with a single inflation value that would be applied everywhere. Figure 10shows the spatially dependent monthly averaged inflation factors and their uncertainties, as determined by the adaptive inflation algorithm for GEnSRF TC2E500_L1500I005. Although the time-averaged values inFigure 10mask the significant temporal variations of the inflation factors, they do highlight the spatial structure that is clearly consistent with the spatial density of the measurement network. This spatial variability in the inflation factors underscores the need for adopting an inflation strategy that can be adjusted recursively. Conversely, if a single inflation value had been used over the entire continent, for example a value of 1.1, then this would have under-inflated the ensemble in the data-dense regions, but over-inflated it over the data-sparse regions. This would have led to additional errors in the final estimated flux estimates and their uncertainties. Overall, results indicate that the density of the measurement network not only controls the estimation accuracy, but also ensures that the entire ensemble system and its associated algorithms function well. In the absence of a dynamical model, the measurements play an even more integral role in the assimilation process, as they drive both the ensemble mean and the ensemble spread. In practice, obtaining such a network not only requires additions to the existing monitoring network, but also improvements to atmospheric transport model that would enable the use of observations collected throughout the day, as was done in TC2.

Figure 10.

Monthly averaged a posteriori inflation factor estimates and associated standard deviations for the case GEnSRF TC2E500_L1500I005. Note that the largest change in the inflation factors and the largest reduction in the prior inflation standard deviations are over areas with more measurements.

[69] An important question that has not been addressed here is the response of the ensemble to either transport error or measurements of varying quality. Given the sensitivity of the ensemble to the density of measurements, we expect differential and/or correlated model-data mismatch errors to play a key role as well. Tests with higher model-data mismatch covariance errors demonstrated that no additional degradation in the performance of GEnSRF was observed relative to that for GIM (results not shown). These tests did not include correlated observational errors, however.

5. Summary and Conclusions

[70] Application of data assimilation techniques for estimating sources and sinks of CO2 provides unique opportunities to better understand the mechanistic processes governing the carbon cycle. In this work, we examined the parameter space of the ensemble filter in terms of estimating CO2 fluxes at high spatial and temporal resolutions. A new ensemble square root filter (GEnSRF) based on the geostatistical inverse modeling technique was presented and applied to a synthetic data study over North America.

[71] The application of GEnSRF to different inversion regimes illustrates a dynamic interplay between three factors: (1) the spatial and temporal density of the measurements available to inform the filter, (2) the ensemble size, and the resultant sampling error, and (3) the implementation of covariance inflation and localization algorithms to ameliorate the latter. Together, these factors determine not only the relative precision and accuracy of the best estimates but also their associated uncertainties. For the ensemble filter to serve as an appropriate replacement for batch estimation of fine-scale fluxes, experiments in this study demonstrated that it may be necessary to have a dense network of measurements in space and time. To some extent, this bodes well for future applications with high-density remote sensing measurements of CO2. Additional studies will be necessary, however, to quantify the impact of biases, correlated errors, temporal heterogeneity etc. in the remote sensing measurements.

[72] It can be argued that the requirement for more measurements may be relaxed if a dynamical model is developed to propagate the CO2fluxes in time. Alternately, if the inversion can be formulated as an over-determined problem it will be better constrained by the measurements, which would also lead to better ensemble behavior. This may be problematic, however, since by solving at large spatial and temporal scales existing deficiencies in the ensemble filter are masked, and aggregation errors grow. In the long run, solving at large spatial and temporal scales may limit methodological advancements in the design of future filters for the CO2source-sink estimation problem.

[73] As the popularity of the ensemble filter within the carbon science community rises, future developments will most likely revolve around reducing the impact of sampling error. In this study, this was the largest source of error resulting from the use of a limited ensemble size. Sensitivity tests with different ensemble sizes established that approximately 500 ensemble members, used in combination with covariance inflation and localization, may be used for estimating 3-hourly fluxes over North America at a 1° × 1° scale. Estimates at both native and aggregated spatial scales were reliable, as were estimates at aggregated temporal scales. Capturing the diurnal cycle of the underlying fluxes proved most difficult, even when covariance inflation and localization were used. By designing inflation and localization techniques that are more tailored or customized to the CO2flux estimation problem, the requisite number of ensemble members may be reduced further to increase the computational efficiency. The two algorithms implemented here are drawn from NWP-related problems. Although they perform reasonably well, questions remain over the behavior of the adaptive inflation technique in data-sparse regions, the appropriateness of existing localization techniques, etc. In spite of these shortcomings, these algorithms can be implemented with a limited ensemble size to obtain reliable posterior CO2flux estimates but with over-inflated uncertainties over data-sparse regions, as was done here.


[74] This work was supported by the National Aeronautics and Space Administration (NASA) through an Earth System Science Fellowship for Abhishek Chatterjee, under grant NNX09AO10H. Additional support was provided through NASA grant NNX12AB90G and a contract from Sandia National Laboratories, Albuquerque, NM, funded under a Laboratory Directed Research and Development project. Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy's National Nuclear Security Administration under contract DE-AC04-94AL85000. The authors acknowledge Ave Arellano, Andy Jacobson and Derek Posselt for fruitful comments and discussions regarding this work. We specially thank Sharon Gourdji for her suggestions and help on the GIM analyses. Finally, we would like to thank the following scientists for sharing their atmospheric measurements that were used in simulating the data-gaps in the observation scenario used in TC1: Arlyn Andrews, Ken Davis, Danilo Dragoni, Marc Fischer, Mathias Goeckede, Ralph Keeling, Bev Law, Natasha Miles, Bill Munger, Matt Parker, Scott Richardson, Britt Stephens, Colm Sweeney, Steven Wofsy and Douglas Worthy.