Global monthly averaged CO2 fluxes recovered using a geostatistical inverse modeling approach: 1. Results using atmospheric measurements



[1] This study presents monthly CO2 fluxes from 1997 to 2001 at a 3.75° latitude × 5° longitude resolution, inferred using a geostatistical inverse modeling approach. The approach focuses on quantifying the information content of measurements from the NOAA-ESRL cooperative air sampling network with regard to the global CO2 budget at different spatial and temporal scales. The geostatistical approach avoids the use of explicit prior flux estimates that have formed the basis of previous synthesis Bayesian inversions and does not prescribe spatial patterns of flux for large, aggregated regions. Instead, the method relies strongly on the atmospheric measurements and the inferred spatial autocorrelation of the fluxes to estimate sources and sinks and their associated uncertainties at the resolution of the atmospheric transport model. Results show that grid-scale estimates exhibit high uncertainty and relatively little small-scale variability, but generally reflect reasonable fluxes in areas that are relatively well constrained by measurements. The aggregated continental-scale fluxes are better constrained, and estimates are consistent with results from previous synthesis Bayesian inversion studies for many regions. Observed differences at the continental scale are primarily attributable to the choice of a priori assumptions in the current work relative to those in other synthesis Bayesian studies. Overall, the results indicate that the geostatistical inverse modeling approach is able to estimate global fluxes using the limited atmospheric measurement network without relying on assumptions about a priori estimates of the flux distribution. As such, the method provides a means of isolating the information content of the atmospheric measurements, and thus serves as a valuable tool for reconciling top-down and bottom-up estimates of CO2 flux variability.

1. Introduction

[2] Carbon dioxide (CO2) is the primary greenhouse gas contributing to global climate change, and numerous studies have focused on developing a thorough understanding of the regional, continental, and global budgets of CO2. Although significant progress has been made in understanding the processes controlling the sources and sinks of CO2, important questions still remain regarding their magnitude, timing and geographic distribution.

[3] Increasingly, inverse modeling approaches are employed to improve estimates of carbon flux at a variety of spatial and temporal scales. These top-down approaches couple observations of CO2 with atmospheric transport models to trace concentration fluctuations back to surface fluxes at specific locations and over prescribed time periods [e.g., Rayner et al., 1999; Enting, 2002]. However, the nature of atmospheric transport (e.g., mixing, diffusion, influence of weather patterns) and its associated uncertainties reduce the information content of available observations. As a result, atmospheric inversions are generally ill-posed, with substantially different flux distributions yielding similar modeled mixing ratios at observational network sites [Enting, 2002]. In this case, uncertainties in observational data and transport models can lead to high uncertainties on estimated fluxes [Enting and Newsam, 1990; Brown, 1993; Hein et al., 1997].

[4] In order to circumvent this ill-posedness, additional information on CO2 sources and sinks is typically introduced into inversions in the form of explicit prior estimates of surface flux. This approach, commonly referred to as synthesis Bayesian inversion, typically obtains these a priori flux estimates from process-based models and/or inventories [e.g., Kaminski et al., 1999; Rödenbeck et al., 2003; Gurney et al., 2004; Baker et al., 2006]. Process-based, or biospheric models typically apply knowledge of small-scale causal mechanisms to predict carbon exchange at larger scales (e.g., Carnegie-Ames-Stanford Approach (CASA) model [Randerson et al., 1997] and Lund-Potsdam Jena (LPJ) Dynamic Global Vegetation Model [Sitch et al., 2000]).

[5] However, because the current global CO2 monitoring network is sparse, some regions of the world remain poorly constrained even after the introduction of a priori assumptions about flux distributions. Therefore, to avoid an underdetermined problem, synthesis Bayesian inversions often estimate fluxes for a small number of prespecified regions loosely based on continental boundaries [e.g., Gurney et al., 2003, 2004; Law et al., 2003; Baker et al., 2006], or, more recently, based on biomes or land cover types [Peters et al., 2007], while keeping the flux patterns within regions fixed. This approach can lead to aggregation errors [Kaminski et al., 2001], where the inferred net estimate from a region can be biased by any inaccuracies in the flux patterns assumed within a particular region. In a few cases, sources and sinks have been estimated at finer scales to reduce such errors, by including a covariance matrix that describes the assumed spatial autocorrelation between fluxes [e.g., Rödenbeck et al., 2003; Rödenbeck, 2005].

[6] Overall, flux estimates and uncertainties derived from atmospheric inversions are sensitive to a priori assumptions, such as the selection of observations, the transport model, prior information, prescribed flux patterns, and error covariance parameters. These differences lead to the observed inconsistencies between reported flux estimates from various inversion studies. There is a growing awareness of the strong influence of these assumptions, especially in regards to the use of explicit prior flux estimates from bottom-up models to define the magnitude and spatial distribution of fluxes [e.g., Michalak et al., 2004; Rödenbeck, 2005]. This influence not only contributes to aggregation errors, but can also cause a posteriori estimates to revert to prior assumptions in underconstrained regions. As such, estimates from synthesis Bayesian inversions cannot be used directly to reconcile process-based understanding of flux behavior with the information content of atmospheric observations. The sensitivity of estimates to other assumptions has also been recognized, with researchers attempting to systematically quantify the magnitude and impact of model-data mismatch and a priori flux uncertainties [e.g., Engelen et al., 2002; Engelen, 2006; Krakauer et al., 2004; Michalak et al., 2005].

[7] Owing to the strong influence of inverse modeling assumptions on estimated sources and sinks, there is a need for an inverse modeling approach for CO2 flux estimation that can more directly reflect the information content of available atmospheric measurements. Such an approach, based on a geostatistical inverse modeling framework, was proposed by Michalak et al. [2004]. This method aims to reduce the influence of modeling assumptions by (1) avoiding the use of bottom-up flux estimates for defining the magnitude and spatial patterns of fluxes, (2) estimating sources and sinks at resolutions that minimize the risk of aggregation errors, and (3) using a rigorous statistical framework for quantifying model-data mismatch and the degree of spatial autocorrelation in the flux distribution. In this manner, the approach yields CO2 flux estimates that are more strongly representative of the spatial and temporal variability of CO2 fluxes as seen through the atmospheric measurement network.

[8] This paper presents the first application of the geostatistical inverse modeling approach for estimating CO2 fluxes using atmospheric observations. The objectives of this work are to (1) explore the ability of the approach to constrain global fluxes with a level of uncertainty comparable to synthesis Bayesian inversions, (2) identify the information content of available observations with regard to the global CO2 flux distribution and its variability at various spatial and temporal scales, and (3) elucidate the impact of prior assumptions used in synthesis Bayesian inversions on flux estimates from previous studies.

[9] Monthly averaged CO2 fluxes are estimated at the resolution of the implemented atmospheric transport model, 3.75° latitude by 5° longitude, for 1997–2001, using observations from a subset of the NOAA-ESRL cooperative air sampling network. To further avoid the use of a priori assumptions, fossil fuel fluxes are not assumed known, contrary to past inverse modeling studies. Instead, this paper estimates total flux, including terrestrial, oceanic, and anthropogenic contributions, which avoids the possibility of aliasing the uncertainties and seasonality of fossil fuel emissions [Gurney et al., 2005] onto the estimated biospheric flux signal. Estimated fluxes are compared at various spatial and temporal scales to bottom-up estimates of biospheric [Randerson et al., 1997], oceanic [Takahashi et al., 2002], and fossil fuel [Brenkert, 1998] fluxes, as well as estimates from the synthesis Bayesian inversion estimates of the TransCom3 Level 3 intercomparison [Baker et al., 2006] and the Rödenbeck et al. [2003] study. A companion paper [Gourdji et al., 2008] explores the ability of auxiliary environmental variables (e.g., surface air temperature, leaf area index) to further constrain flux distributions within the geostatistical inverse modeling framework, especially at fine spatial resolutions.

[10] The remainder of the paper is organized as follows. Section 2 presents the components of the geostatistical inverse modeling approach and the corresponding system of equations. This section also outlines the approach used to optimize the model-data mismatch and spatiotemporal covariance parameters. Section 3 presents and discusses the optimized covariance parameters and inversion results at the grid and continental scales, along with a comparison to previously published flux estimates. Finally, section 4 summarizes the conclusions of the paper and presents suggestions for future research.

2. Methods

[11] The surface flux estimates presented in this paper are obtained using a geostatistical inverse modeling approach; a full description of the method and overall algorithm are presented by Michalak et al. [2004]. This section presents a summary of the method with a description of model extensions (Figure 1).

Figure 1.

Schematic of geostatistical inversion components and algorithm. White boxes indicate inversion inputs, light gray boxes indicate inversion steps, and dark gray boxes represent inversion outputs. Gray circles indicate the sequence of steps in the algorithm.

[12] In this work, the geostatistical approach is used to estimate monthly CO2 surface fluxes from January 1997 to December 2001 at a 3.75° latitude by 5° longitude resolution. Overall, the inversion estimates 3456 monthly fluxes for a total of 207,360 unknowns, using 2275 known observations from 44 measurement locations.

2.1. Geostatistical Inverse Modeling Objective Function

[13] The geostatistical approach models the flux distribution as the sum of a deterministic but unknown component, , referred to as the model of the trend, and a zero-mean stochastic component with a spatial and/or temporal autocorrelation described by the covariance matrix Q. The model of the trend defines the portion of the flux signal that can be explained by a set of covariates included in the matrix X. This spatiotemporal trend can be as simple as a constant mean, but can also include linear relationships with any number of auxiliary variables related to flux. The overall best estimates obtained through this approach minimize flux deviations from the model of the trend as well as residuals between actual atmospheric CO2 measurements and concentrations derived from the estimated fluxes. Mathematically, this solution corresponds to the minimum of a cost function, Ls,β, defined as:

equation image

[14] In this equation, s is an m × 1 state vector representing the unknown surface flux distribution. H is the n × m Jacobian matrix derived from an atmospheric transport model, representing the sensitivities of the observations z (n × 1) to fluxes s (m × 1), and R is the n × n model-data mismatch covariance matrix. In the second half of the objective function, the matrix X (m × p) defines the p components, or covariates, of the spatiotemporal trend. β is a p × 1 vector of unknown drift coefficients that scale the components in X. As such, Xequation image is the resulting model of the trend that is estimated as part of the inversion process. The covariance matrix Q is an m × m matrix representing the a priori spatiotemporal correlation structure of flux residuals not explained by . Further descriptions of these components are presented in sections 2.2 to 2.6. Final flux estimates (equation image) and drift coefficients (equation image) are obtained by minimizing equation (1) with respect to both s and β, as described in section 2.8.

[15] The primary difference between the geostatistical method and the synthesis Bayesian approach lies in the second part of the objective function. First, prior flux estimates used in synthesis Bayesian inverse modeling are replaced by the estimated model of the trend, . Second, the Q matrix has off-diagonal terms representing the a priori spatial and/or temporal autocorrelation of the flux residuals. The parameters describing this autocorrelation can be quantified through the application of the Restricted Maximum Likelihood approach, as described in section 2.7.

2.2. Observational Data (z)

[16] Fluxes are estimated using monthly averaged CO2 concentration measurements from 44 measurement locations in the NOAA Earth System Research Laboratory (ESRL) Global Monitoring Division cooperative air sampling network [Tans and Conway, 2005], as shown in Figure 2. The number of measurements for any given month ranges from 35 to 42, as some locations have missing data during the examined time period.

Figure 2.

NOAA-ESRL cooperative air sampling network measurement locations [Tans and Conway, 2005] used in the current study. Note that some locations do not have measurements for all months.

[17] The subset of the observational network used in this application is similar to that used by Rödenbeck et al. [2003], who used measurements from locations with observational data gaps of less than 2 months to ensure spatial and temporal consistency. Although this approach limits the number of measurement sites used in the analysis, it reduces the risk of flux estimates being unduly affected by monthly changes in the monitoring network [Law et al., 2003; Rödenbeck et al., 2003]. Two stations, SYO (Syowa Station, Antarctica, Japan) and GOZ (Dwejra Point, Gozo, Malta), are added to the measurement network used by Rödenbeck et al. [2003].

2.3. Transport Model (H)

[18] Linear inverse modeling requires the formulation of a Jacobian matrix, H, representing the sensitivities of observations at each measurement location-month to a pulse of CO2 emitted at each estimation location-month. This Jacobian matrix was derived from an adjoint implementation of the TM3 transport model [Heimann and Körner, 2003] which has a spatial resolution of 3.75° latitude by 5° longitude, 19 vertical levels and interannually varying winds derived from the National Centers for Environmental Prediction (NCEP) Reanalysis [Kalnay et al., 1996]. Transport information relating monthly averaged CO2 observations to monthly grid-scale fluxes were calculated by Rödenbeck et al. [2003] for 1982–2001, and the subset for 1997 to 2001 is used here.

2.4. Model of the Trend ()

[19] The flux distribution (equation image), the drift coefficients (equation image), and their respective uncertainties (equation image,equation image) are estimated concurrently as part of the inversion. Hence, the resulting estimated model of the trend, Xequation image, represents the expected flux value (i.e., mean),

equation image

[20] For this application, the structure of the trend (X) assumes a constant mean flux for all land and all ocean regions, as in the Michalak et al. [2004] study. However, in this study, the mean land and ocean flux is allowed to vary seasonally, with a different average for each calendar month. As such, the X matrix has dimensions m × 24, where within each column, all elements are zero except for ones corresponding to land or ocean grid cells for a given calendar month. Thus, the expected value of surface fluxes for any grid cell and month is ultimately represented by a single unknown β, corresponding to a particular monthly mean of land or ocean flux. The companion paper [Gourdji et al., 2008] incorporates auxiliary environmental variables into the model of the trend as covariates in order to better define grid-scale flux variability in the spatiotemporal trend.

2.5. Spatial Covariance Matrix (Q)

[21] The covariance matrix Q (m × m) defines the a priori spatiotemporal autocorrelation of flux deviations from the unknown trend () at the scale of the a posteriori flux estimates. In the current implementation, spatial but not temporal correlation is assumed a priori. Therefore, the Q matrix is block diagonal, with each block Qi,i having dimensions (mi × mi), where mi = 3456, i.e., the number of fluxes estimated every month at a 3.75° × 5° resolution. Each block represents the spatial covariance between flux residuals at all pairs of estimation locations within a given month. No a priori temporal correlation was assumed to avoid unrealistic smoothing of relatively fast events such as leaf-out in spring. The same block is repeated for all months of the inversion, i.e., Q1,1 = Q2,2 = … = Q60,60,

equation image

The covariance is assumed to decay exponentially with separation distance,

equation image

where hij is the separation distance between two estimation locations, σQ2 is the variance of flux residuals at large separation distances, and l is the correlation range parameter such that the covariance approaches zero for separation distances on the order of 3l. The choice of the exponential covariance function is based both on the work of Michalak et al. [2004] and on a variogram analysis of the spatial variability of typical land and ocean bottom-up estimates. An exponential model assumes spatial correlation while also allowing for continuous but not differentiable small-scale variability. In this work, spatial correlation is assumed among land and ocean flux residuals but not between them, as different processes drive CO2 fluxes in each domain.

[22] The parameters σQ2 and l are optimized using a Restricted Maximum Likelihood approach, as described in section 2.7. Because the deterministic component of the flux distribution (Xequation image) is constant for a given month for both land and ocean fluxes, the spatial covariance of the flux residuals simply represents the autocorrelation of the fluxes themselves for this particular application. Note that this is not the case for a more complex model of the trend, as presented in the companion paper [Gourdji et al., 2008].

2.6. Model-Data Mismatch Covariance Matrix (R)

[23] The model-data mismatch covariance matrix R is a diagonal matrix whose elements represent the variances associated with measurement, transport, and representation errors [Engelen et al., 2002; Engelen, 2006] for each observation location-month. In this study, the variances in R are obtained by optimizing a single scaling factor (c) applied to a vector of squared residual standard deviations (RSDs) for each measurement location. The RSDs are monthly averaged deviations from a smooth curve fitted to all observations at every location [GLOBALVIEW-CO2, 2008]. This setup is similar to that used by Gurney et al. [2004], and assumes that the model-data mismatch uncertainty associated with each of the 44 locations scales proportionately with their squared RSD. This approach has been shown to yield results that are comparable in terms of residual statistics to more complex setups that separate measurement locations into multiple subgroups [Michalak et al., 2005]. In this setup, R is defined as

equation image

where image are the squared RSDs for available observation locations for a given month.

2.7. Restricted Maximum Likelihood (RML) Approach for Optimizing Covariance Parameters

[24] The RML approach [Kitanidis, 1986, 1995; Michalak et al., 2004] is a statistical technique for quantifying covariance parameters by maximizing the likelihood of available data y. In practice, the RML approach minimizes the negative logarithm of this likelihood with respect to the covariance parameters θ, yielding the following objective function:

equation image

[25] In the current application, the R and Q covariance parameters are optimized in separate steps. Each step has a different definition of F and Ψ and thus, these symbols are described separately below.

2.7.1. Optimization of a Priori Spatial Covariance Parameters

[26] In the first step, covariance parameters in Q (θ = {σQ,land2, lland, σQ,ocean2, locean}) are optimized using Net Ecosystem Production (NEP) from CASA [Randerson et al., 1997], oceanic carbon exchange from Takahashi et al. [2002], and fossil fuel emissions from Brenkert [1998]. In this setup, y represents the bottom-up flux estimates, F = X (the components of the model of the trend outlined in section 2.4), and Ψ= Q (the a priori autocorrelation of the flux distribution described in section 2.5).

[27] Whereas Michalak et al. [2004] demonstrated the feasibility of inferring spatial flux variability directly using atmospheric measurements, the current study uses bottom-up flux estimates to optimize these covariance parameters. This change was made because the inverse problem solved here is even more highly underdetermined than the earlier study, with the number of estimated fluxes outnumbering the number of observations by 100 to 1. As a result, the covariance parameters that could have been inferred from the atmospheric observations may not have captured the true scales of variability in the flux distribution. Although the magnitude of the bottom-up fluxes may not be accurate at specific locations, these fluxes are better able to represent the amount of spatial variability at the resolution of the atmospheric transport model (i.e., 3.75° × 5°) than that inferred solely from the sparse atmospheric data themselves.

[28] Additional tests were run to determine the sensitivity of the a posteriori flux estimates to the choice of bottom-up estimates used for defining the degree of spatial variability, and the optimization method used to quantify the covariance parameters. Results show that a posteriori flux estimates and uncertainties are relatively robust to these differences, especially at aggregated scales (results not shown).

2.7.2. Optimization of the Model-Data Mismatch Covariance Scaling Parameter

[29] In the second step, the atmospheric observations are used to estimate the scaling factor (θ = {c}) applied to the squared RSD's in the model-data mismatch covariance matrix (R). In this second step, y = z (or the atmospheric concentration measurements), F = HX (or the components of the trend, projected onto the measurement space through the transport matrix), and Ψ = HQHT + R (or the covariance of deviations from the trend HX).

[30] To verify that the optimized scaling factor (c) is consistent with the setup of the inversion, the χR2 statistic is calculated using the difference between conditional realizations of estimated fluxes transported into measurement space (Hsci) and the available atmospheric measurements (z),

equation image

where n represents the total number of observations used in the inversion. The conditional realizations of the flux distribution represent equally likely flux scenarios given the information content of the available measurements. The use of conditional realizations allows the χR2 to be evaluated individually for each measurement location, as described by Michalak et al. [2005]. The χR2 values close to 1 indicate that the a posteriori fluxes are able to reproduce the measurements to the degree specified by values in R.

2.8. Geostatistical Inversion System of Equations

[31] A posteriori flux estimates are obtained by minimizing the objective function, Ls,β (equation (1)), with respect to s and β. The minimum can be obtained by solving the following system of equations [e.g., Michalak et al., 2004]:

equation image


equation image

[32] The weights, Λ (m × n), and Lagrange multipliers, M (p × m), are used to define the a posteriori best estimates of the flux distribution (equation image) and their a posteriori uncertainty covariance (Vequation image) as

equation image
equation image

[33] The (mi × mi) diagonal blocks of equation image represent the estimated a posteriori error covariance of fluxes within a given month, whereas off-diagonal blocks represent a posteriori temporal error covariances between months. The diagonal elements of equation image represent the a posteriori uncertainties of the individual fluxes in equation image.

[34] Estimates of the drift coefficients (equation image), and their a posteriori error covariance (Vequation image) are given by

equation image
equation image

[35] The diagonal elements of Vequation image (p × p) represent the posterior uncertainty of the drift coefficients, while the off-diagonal terms in Vequation image represent the estimated error covariance between these coefficients.

3. Results and Discussion

3.1. Optimized Covariance Parameters

[36] The optimized covariance parameters used to construct the a priori (Q) and model-data mismatch (R) covariance matrices are presented in Table 1. Results show that the inferred flux variability at a 3.75° × 5° resolution is higher for terrestrial fluxes relative to oceanic fluxes, with a land variance (σQ,land2) 2 orders of magnitude higher than that for the oceans (σQ,ocean2) and a terrestrial flux correlation length (lland) approximately half that of oceanic fluxes (locean). This inferred regional variability is consistent with previous assessments of ocean and land fluxes that have shown that terrestrial fluxes are much more variable than their oceanic counterparts [e.g., Bousquet et al., 2000]. Note that the variability at a 3.75° × 5° resolution may be different than that observed at finer spatial scales, because processes that drive small-scale fluxes, such as regional droughts or biomass burning, are averaged out over larger regions.

Table 1. Optimized Model-Data Mismatch and Spatial Covariance Parameters With ±1 Standard Deviationa
Covariance MatrixSill Variance σ2, [μmolCO2/(m2s)]2Correlation Length l, kmScaling Factor c
  • a

    Model-data mismatch, R; spatial covariance, Q.

Qland0.40 ± 0.032700 ± 200-
Qocean0.0030 ± 0.00035700 ± 500-
R--0.63 ± 0.04

[37] The estimated correlation lengths presented here are longer than those used by Rödenbeck et al. [2003] (henceforth referred to as CR03), which were 1275 km for land and 1912 km for oceans. These dissimilarities may be due to the differences in covariance parameter optimization schemes, bottom-up fluxes used to assess flux variability, and/or other constraints. For example, CR03 constrained the total amount of global a priori uncertainty to that reported for global land and ocean flux estimates by the Intergovernmental Panel on Climate Change [2001], and then downscaled this amount to the grid scale. Regardless, the difference in correlation lengths implies that the scale over which measurements are assumed to be representative of the underlying flux distribution was smaller in the CR03 study relative to this paper. Because the spatial scales of flux variability may change from one geographic location to another, and different factors may drive this variability at smaller versus larger scales, it is difficult to directly validate the correlation length estimates presented in either of the two studies. However, the differing assumptions regarding a priori information on a posteriori results from these studies will be further examined in section 3.4.2.

[38] The optimized scaling factor (c), also presented in Table 1, produces model-data mismatch variances at individual measurement locations that range from 0.09 ppm to 5.3 ppm. The majority of these variances are either similar to or somewhat higher than those employed by CR03 and Baker et al. [2006] (henceforth referred to as DFB06) for coincident locations (Figure 3). One reason for the higher model-data mismatch estimates in the current work is the fact that DFB06 used smoothed Globalview [GLOBALVIEW-CO2, 2008] measurements, which are generally easier to reproduce than the flask measurements used here. In addition, the geostatistical inversion presented in this work uses an a priori constant spatial mean for land and oceans per month, which may not be able to represent small-scale flux variability relative to studies that use explicit prior flux estimates, particularly in underconstrained regions.

Figure 3.

Model-data mismatch standard deviation for observation locations used in this study and coincident locations used in CR03 and DFB06.

[39] The optimized scaling factor (c) and the resulting model-data mismatch uncertainties were further evaluated using the χR2 statistic for each measurement location using conditional realizations of the a posteriori flux distribution, as described in section 2.7 and by Michalak et al. [2005]. The χR2 statistic averaged over all measurement locations is 1.0, indicating that measurements are reproduced to the degree assumed by the optimized model-data mismatch covariance matrix. Because the χR2 statistic in this application is calculated using conditional realizations of flux, this statistic can also be evaluated for individual measurement locations [Michalak et al., 2005], as shown in Figure 4. Figure 4 demonstrates that most locations have χR2 values that cluster around 1.0 with most deviating by less than ±0.5. The χR2 values for marine boundary layer sites have less scatter than for continental sites. Two locations (Easter Island, Chile (EIC) and Hegyhatsal, Hungary (HUN)) in particular have relatively large χR2 values. As such, the amount of uncertainty specified in the R matrix overestimates the ability of the inversion to reproduce measurements at these specific locations as noted by previous studies [e.g., Law et al., 2003]. In future work, a more complex structure could be considered for the R matrix, similar to those examined by Michalak et al. [2005], to account for the additional uncertainty at these locations.

Figure 4.

The χR2 for each observation location calculated from conditional realizations of the a posteriori fluxes resulting from inversion with optimized covariance parameters. The solid line represents χR2 = 1.0, which is also the mean χR2 across stations.

3.2. Drift Coefficients (equation image) and Uncertainties (equation image)

[40] The drift coefficients and their associated uncertainties, representing monthly averages of terrestrial and oceanic fluxes, are presented in Figure 5. As emphasized in section 2.4, the drift coefficients are estimated as part of the geostatistical inversion and therefore reflect the information content of the atmospheric measurements.

Figure 5.

Estimated monthly land and ocean constants (equation image) (+/−equation image in solid lines and +/−2σequation image in dashed lines). For the model of the trend used here, these coefficients represent the average monthly flux (Xequation image) for land and ocean.

[41] The terrestrial drift coefficients representing monthly land averages, including both fossil fuel emissions and biospheric sources and sinks, show a seasonal cycle which is noticeably more representative of the behavior of the Northern Hemisphere fluxes. This is an expected result given that the Northern Hemisphere is better constrained by atmospheric measurements, and that a larger fraction of land mass is north of the equator. Conversely, the average ocean monthly fluxes lack a strong seasonality. These drift coefficients indicate that the oceans act as significant net sinks of CO2 in March, July, August and September. As will be discussed in section 3.4.1, the magnitude of this sink may be partially due to aliasing of the Northern Hemisphere photosynthetic signal onto the oceans during these months. Nevertheless, the ocean coefficients generally agree with other estimates of average ocean source/sink behavior [e.g., Takahashi et al., 2002]. The uncertainty bounds for the ocean coefficients are narrower relative to their terrestrial counterparts, primarily because the inferred oceanic flux variability is smaller than that of land fluxes as indicated by the longer correlation length, locean, and smaller sill variance, σQ,ocean2, parameters presented in Table 1. This low variability implies that limited knowledge about oceanic fluxes is sufficient to inform their overall mean behavior.

3.3. A Posteriori Grid-Scale Flux Estimates (equation image) and Uncertainties (equation image)

[42] The a posteriori flux estimates, including anthropogenic sources and their associated uncertainties, are shown in Figure 6 at the recovered flux resolution of 3.75° × 5° for the sample months of January and July, 2000. In well-constrained areas, particularly in the Northern Hemisphere, the geostatistical inversion is able to estimate fluxes that generally correspond well with current understanding of CO2 sources and sinks. While the uncertainties are too large to make definite conclusions about the sign of the flux at the grid-scale resolution, especially given the limited measurement network used in this study, the main objective of estimating fluxes at this fine scale is to obtain a set estimates that can be aggregated to larger resolutions, in a manner that minimizes aggregation errors associated with estimating directly at coarser scales. As expected, areas of low uncertainty are generally located around measurement locations in regions defined in the TransCom analysis (DFB06) as Temperate Asia, Europe and Temperate North America (Figure 7).

Figure 6.

Flux estimates (equation image) for (a) January and (b) July 2000 and (c, d) their associated uncertainties (equation image). Observation locations for each month are shown as white dots. Note that the grid-scale fluxes should be interpreted together with their standard deviations, because many areas have high a posteriori uncertainties (e.g., Antarctica).

Figure 7.

Locations of 11 land and 11 ocean TransCom regions [e.g., Gurney et al., 2003].

[43] Note that the inversion was designed to estimate fluxes everywhere, including ice-covered regions, which are generally assumed to have no significant sources or sinks of CO2. Although these regions could easily have been left out of the inversion, including them provides an opportunity to assess the performance of the approach for areas where fluxes are considered well known. If estimates for Antarctica had shown significant fluxes at any spatial and temporal resolution, for example, this would have indicated a bias in the inversion setup or transport model. In the presented results, none of the grid-scale estimates for ice-covered regions are significantly different from zero at the 1equation image confidence level. This also holds true when estimates are aggregated to continental and/or annual scales, which lends support for the ability of the approach to accurately identify fluxes for these regions.

3.4. Aggregated Comparison to Existing CO2 Flux Estimates

[44] Fluxes and their associated uncertainties are aggregated to the regions used in the TransCom intercomparison study [Gurney et al., 2003, 2004; Law et al., 2003; Baker et al., 2006] (Figure 7). Given that flux estimates cannot be independently validated, the aggregated a posteriori geostatistical fluxes are then compared to those estimates from three previous studies (one bottom-up and two inversion applications) to evaluate the ability of a specific inversion method to constrain continental-scale fluxes, and to identify areas where all estimates are in agreement with one another. Specifically, inferred fluxes are compared to (1) an aggregated set of 1° × 1° bottom-up flux estimates (described in section 2.7.1) with regional corrections for deforestation and regrowth, (2) the TransCom 3 Level 3 intercomparison study (DFB06) where monthly flux deviations were recovered at the continental scale, and (3) the CR03 study, which estimated monthly flux deviations at a 7.5° latitude by 10° longitude resolution. Table 2 outlines the major components used in these top-down inversions compared to the geostatistical approach, in order to clarify the assumptions used in each study.

Table 2. Comparison of Inversion Setups
Resolution3.75° latitude × 5° longitude, monthly fluxes (1997–2001).22 TransCom regions, monthly fluxes (1988–2003). Monthly continental flux estimates assume pre-specified flux patterns as determined from a compiled set of bottom-up flux estimates [Randerson et al., 1997; Takahashi et al., 1999].7.5° latitude × 10° longitude, monthly fluxes (1982–2001).
Measurement network44 measurement locations from NOAA/ESRL sampling network [Tans and Conway, 2005].78 measurement locations from GlobalView-CO2 [2008].11–44 measurement locations from NOAA/ESRL sampling network [Tans and Conway, 2005].
Transport modelTM3 [Heimann and Körner, 2003] using National Centers for Environmental Prediction (NCEP) Reanalysis winds.13 different transport models.TM3 [Heimann and Körner, 2003] using National Centers for Environmental Prediction (NCEP) Reanalysis winds.
Model-data mismatch covarianceScaling factor (c) applied to square of the residual standard deviations (RSDs) for each observation location, optimized using Restricted Maximum Likelihood (RML).Sum of (1) transport model error determined from the spread of the a posteriori fluxes using different transport models, and (2) measurement error as described in DFB06.Sum of (1) measurement error of variance 0.16/n where n is the number of data values per month (max n = 8), and (2) model error as estimated from analysis of forward model run of prior flux estimates.
Prior flux distributionNo prior assumptions regarding flux distributions; a set of means and/or covariates are included a priori in the X matrix. For this application, X contains 24 columns representing an assumption of constant monthly means over land and oceans. The drift coefficients (βs) that define the means are estimated as part of the inversion.Prior estimates derived from (1) fossil fuel emissions (estimated from [Brenkert, 1998 and Andres et al., 1996]), (2) CASA fluxes [Randerson et al., 1997] and (3) ocean fluxes [Takahashi et al., 1999] from pCO2 measurements. Priors also include deviations from neutral biosphere based on knowledge of deforestation in the tropics and a upper mid-latitude sink.Prior estimates derived from (1) fossil fuel emissions (based on EDGAR database [Olivier and Berdowski, 2001]), (2) mean seasonal cycle LPJ biosphere modeled fluxes (i.e., Lund-Potsdam Jena (LPJ) Dynamic Global Vegetation Model [Sitch et al., 2000]) and (3) ocean fluxes from Gloor et al. [2003] from an inversion of ocean carbon data, and from Takahashi et al. [1999] based on pCO2 measurements.
Prior flux covarianceBased on correlation of flux residuals, modeled using an exponential covariance function. Parameters estimated separately for land and ocean regions using RML.Seasonally varying land a priori uncertainty from Gurney et al. [2004]. Ocean uncertainty from Gurney et al. [2003] with an additional factor of (0.5 GtC/yr)2.Based on correlation of residual flux values using an exponential covariance model. Correlation lengths based on statistical analysis of model inter-comparisons. Variances based on global uncertainty estimates, downscaled for land areas using a flux intensity factor based on annual NPP.

[45] Note that the geostatistical inversion method relies more heavily on the information content of the atmospheric CO2 measurements relative to the other examined studies. Therefore, consensus among results would indicate regions where fluxes can be assumed to be relatively well understood, and therefore insensitive to the assumptions inherent in each study. In areas where the surface flux estimates vary, the impact of model assumptions on each estimate are explored. However, future research and/or more measurements may be required in order to reconcile CO2 budgets.

3.4.1. Continental-Scale Seasonal Flux Comparison for Year 2000

[46] Figure 8 shows that the continental-scale geostatistical fluxes are comparable to both the bottom-up and DFB06 estimates. CR03 monthly flux estimates were not available and are therefore not shown. Figure 8 also demonstrates that the seasonality of fluxes from DFB06 and the geostatistical inversion agree particularly well for better-constrained regions (e.g., Europe and Australia), suggesting that the seasonality in these areas is relatively well understood. However, even in regions that are not well constrained by the current observational network, such as in Northern and Southern Africa, these two sets of flux results generally have similar magnitudes and seasonal variation. These results support the contention that a geostatistical inverse modeling approach can be used to recover fluxes with comparable accuracy and precision to existing synthesis Bayesian approaches, without relying on bottom-up flux estimates to define the magnitudes and spatial patterns of prior information.

Figure 8.

Monthly best estimates (equation image) aggregated to 22 TransCom regions with 1equation image confidence intervals for year 2000 for geostatistical inversion, DFB06, and bottom-up estimates [Randerson et al., 1997; Takahashi et al., 2002; Brenkert, 1998] used as priors by DFB06. DFB06 estimates include fossil fuels from Brenkert [1998] for consistency.

[47] Figure 8 also shows that in many underconstrained areas where there are differences between estimates, both the DFB06 and the geostatistical results tend toward their prior assumptions, which are respectively the bottom-up flux estimates and the geostatistical model of the trend (Xequation image). An example of this can be seen in the Tropical East Pacific, South Pacific, and Tropical West Pacific, where there is a lack of atmospheric observations. The geostatistical estimates for these regions covary strongly, principally because the fluxes themselves tends to revert to the model of the trend (Xequation image). In particular, flux estimates in many of these regions follow the seasonality reflected in the oceanic mean flux as shown in Figure 5. The DFB06 estimates, in contrast, reflect their respective prior flux estimates, i.e., bottom-up estimates from Takahashi et al. [2002]. Since Takahashi et al. [2002] fluxes are based on extrapolated ship-track data, the ocean uptake predicted by the geostatistical inversion for these regions (TEPa, SoPa, TWPa) likely reflects a lack of observational data rather than a true departure from previous estimates. Overall, however, this result suggests that flux estimates for TEPa, SoPa, and TWPa obtained by previous inversion studies were strongly determined by a priori fluxes rather than the atmospheric CO2 observations.

[48] As shown by Figure 8, the continental-scale estimates for the northern high-latitude regions (e.g., Boreal North America, Boreal Asia, Northern Pacific, and Northern Atlantic) differ from past estimates, particularly in July, August and September. As mentioned in section 3.2, the geostatistical a posteriori estimates may be affected by aliasing of a terrestrial photosynthetic signal onto adjacent ocean areas during the Northern Hemisphere summer months. This land-ocean aliasing cannot be directly measured, but it can be qualitatively seen most clearly in the North Pacific and North Atlantic, i.e., in ocean regions that are contiguous with land masses that exhibit strong flux seasonality. This aliasing can also be observed to a limited extent in past synthesis Bayesian inversion studies (e.g., DFB06), but tight a priori constraints on ocean fluxes limit the size of this effect, despite the fact that the atmospheric observations do not have sufficient information to accurately partition land and ocean fluxes. This observation points to difficulties in using an atmospheric transport model to correctly partition land and ocean signals during specific months in these regions using the current measurement model. However, there is also a possibility that a portion of the correlation between adjacent land and ocean regions observed in the current study is reflective of true flux behavior in these two domains. Note that the constant monthly mean assumption for land and ocean regions used in this study is not the cause of this aliasing, because results shown in the companion paper [Gourdji et al., 2008] indicate that using a more complex model of the trend to capture more of the expected spatial variability of fluxes over land reduces the observed correlation between land and ocean fluxes, but does not eliminate it.

[49] Because the geostatistical method accounts for spatial correlation in the flux distribution, which effectively allows the application to use the measurement information over larger scales compared to the DFB06 study, the uncertainty bounds on the geostatistical flux estimates (equation imageequation image) are also typically either comparable to, or narrower than, those from the previous work at this aggregated scale. The relatively small variability of the ocean fluxes, as inferred by RML and specified in the a priori spatial covariance matrix Q, also translates into narrower a posteriori uncertainty bounds on the geostatistical ocean flux estimates. Conversely, the geostatistical uncertainties are higher than those of the DFB06 study in a few well-constrained terrestrial regions (e.g., Europe and Australia), principally because the TransCom study used more measurement locations in these areas.

3.4.2. Interannual Flux Variability Comparison

[50] Figure 9 presents mean-deviated annual moving averages of the a posteriori flux estimates for the geostatistical inversion, DFB06 and CR03. To be consistent with the geostatistical estimates, fossil fuel emissions from Brenkert [1998] and an interannual fossil fuel component from DFB06 were added to the fluxes from the latter two studies. The plot suggests that there is good agreement between the three studies with regard to the terrestrial interannual variability for most regions. One exception is Tropical America, where the CR03 fluxes display much more interannual variability relative to the other two sets of estimates. This may be due to the higher a priori uncertainty used by CR03 for this region relative to both the DFB06 study and the amount of variability assumed by the a priori covariance for the geostatistical application. As a result, the interannual variability of the DFB06 and geostatistical inversion results for this area may be more realistic. For most other regions, the geostatistical inversion recovers interannual variability that is comparable to DFB06 and CR03, particularly in better-constrained regions.

Figure 9.

Mean-deviated deseasonalized fluxes with 1equation image confidence intervals for 1997–2001 for geostatistical inversion, DFB06, and CR03. DFB06 and CR03 estimates include fossil fuels from both Brenkert [1998] and an interannual component as specified by DFB06 for consistency. Confidence intervals for all estimates are at 1σŝ. Estimates are presented for the 22 TransCom regions.

[51] For the ocean regions, the geostatistical mean-deviated fluxes show little interannual variability relative to fluxes from DFB06 and CR03. This difference is only significant in the Temperate East Pacific and South Pacific, where fluxes are highly influenced by the El Niño–Southern Oscillation (ENSO). In these regions, the DFB06 results show a significantly greater CO2 uptake than the others from the beginning of 1997 to the middle of 1998, and more outgassing in 2000. DFB06 used a richer measurement network in this region than either of the other two studies, which may have helped to inform fluxes during these years [e.g., Patra et al., 2005].

[52] As with the seasonal results presented in section 3.4.1, Figure 9 shows that the geostatistical inversion flux estimates have comparable uncertainty bounds (σequation image) to those from CR03 or DFB06, except in regions where TransCom used a more extensive observational network (e.g., Temperate Asia, Australia, Tropical Asia, and Europe). Differences in correlation length likely have little impact on the difference in a posteriori uncertainties between the geostatistical inversion and the CR03 study at this scale, because the geostatistical inversion was found to be relatively insensitive to this parameter (within the range examined by these two studies) for fluxes aggregated to continental resolutions (section 2.7.1). Instead, differences in the a posteriori uncertainty relative to CR03 are likely due to the a priori uncertainties used in the two studies, and the fact that CR03 assumed a spatially variable a priori land uncertainty (σQ,land2) proportional to Net Primary Production (NPP). These results reinforce the fact that a posteriori uncertainties reflect not only the information provided by atmospheric measurements, but also the a priori covariance assumptions.

3.4.3. Annually Averaged Aggregated Sources and Sinks

[53] Figures 10a and 10b present annually averaged, aggregated land and ocean flux estimates for the 22 TransCom regions for the geostatistical inversion, DFB06, CR03, and the bottom-up fluxes. All fluxes represent averages for the period 1997 to 2001. Unlike other presented results, annual averages of fossil fuel emissions from Brenkert [1998] were subtracted a posteriori from the geostatistical estimates, in order to make them comparable to the biospheric fluxes reported by CR03 and DFB06. The uncertainty bounds, however, include the total uncertainty estimated for the sum of these two flux components. Because annually averaged fossil fuel emissions are better understood than their seasonality [Gurney et al., 2005], subtracting inventory fossil fuel emissions from the geostatistical estimates at the annual scale should provide an accurate estimate of the biospheric fluxes inferred using the presented approach.

Figure 10.

Mean (a) biospheric and (b) oceanic flux (GtC/a) for geostatistical inversion, DFB06, CR03, and bottom-up flux estimates [Randerson et al., 1997; Takahashi et al., 2002]. Solid and dashed lines represent 1equation image and 2equation image, respectively. Fluxes from all studies are averaged from 1997 to 2001. Estimates are presented for the 22 TransCom regions.

[54] For seven TransCom continental regions (Boreal North America, Temperate North America, Northern Africa, Boreal Asia, Temperate Asia, Tropical Asia and Australia), none of the three inverse modeling studies yield fluxes that are significantly different from zero, and the estimated fluxes vary among themselves by less than one GtC/a. For three of the other continental regions (Tropical America, South America, Southern Africa), the large differences (significant at 1equation image for the first region) in both sign and magnitude between DFB06 and CR03 may be due to their use of different terrestrial prior flux estimates (i.e., CASA estimates [Randerson et al., 1997] in DFB06 versus LPJ estimates [Sitch et al., 2000] in CR03). For example, whereas LPJ estimates a 0.1 GtC/a sink in Tropical America, CASA predicts a 0.56 GtC/a source. In contrast, given that the geostatistical estimates better reflect the information content of the atmospheric data, results tend to show intermediate values for these regions.

[55] However, as shown in Figure 10a, estimates for Europe significantly vary (at 1equation image). The use of different measurements for this relatively well-constrained region across studies most likely explains this discrepancy. For example, an additional measurement location (GOZ) was used in the geostatistical inversion compared to the CR03 study. The stronger European sink in the DFB06 study may also be related to their use of an expanded measurement network. However, Michalak et al. [2005] also showed that flux estimates for Europe were highly sensitive to the choice of a priori flux and model-data mismatch uncertainties for synthesis Bayesian inversions. Overall, the comparison points to the considerable influence of the choice of observations, inversion set-up and prior flux estimates on inferred fluxes, even for regions that are generally considered to be well-constrained by the measurement network. This is especially true when looking at net annual fluxes, which represent a relatively small residual between large seasonal sources and sinks.

[56] For all ocean regions, the geostatistical annually averaged fluxes show significant sinks with little variation between estimates. Particularly, the Tropical Indian and the Tropical East Pacific ocean geostatistical estimates reflect more neutral results than those from the other studies, with DFB06 and CR03 reporting a significant source of around 0.6 GtC/a for the latter region. This is consistent with the seasonal results presented in section 3.4.1, which suggests that the geostatistical oceanic fluxes tend to be influenced by the estimated monthly mean for poorly constrained regions. An analysis of the off-diagonal a posteriori covariance terms (Vs) aggregated to the 22 TransCom regions shows that oceanic flux estimates in poorly constrained regions rely on the long correlation lengths specified in the a priori spatial covariance matrix (Q) due to the lack of atmospheric observations. The large positive a posteriori cross covariances for adjacent ocean regions also suggest that the confidence bounds shown in Figure 10b may be underestimated for these regions.

[57] Finally, note that the companion paper [Gourdji et al., 2008] shows that the annually averaged estimates presented here are consistent with those from an inversion that includes auxiliary environmental variables in the model of the trend for most continental-scale land regions.

4. Conclusions

[58] This paper presents the first application of a geostatistical inverse modeling approach for estimating global monthly fluxes of CO2 using atmospheric CO2 concentration data, without the use of predefined flux patterns or a priori assumptions about flux magnitudes. Results demonstrate that the existing atmospheric monitoring network can be used to estimate surface fluxes and their associated uncertainties at a 3.75° resolution, which limits aggregation errors inherent to inversions conducted at coarser scales. A posteriori fluxes aggregated to the 22 TransCom regions have uncertainties that are comparable to those reported by previous synthesis Bayesian inversions at monthly and interannual timescales. Overall, this work demonstrates that the presented approach provides a valuable data-driven alternative to synthesis Bayesian inversion methods, by avoiding many a priori assumptions inherent to aggregation, uncertainty estimation, and the magnitude and spatial patterns of flux distributions.

[59] At the grid scale, geostatistical flux estimates are most influenced by the limited information content of the available atmospheric measurements, and therefore have correspondingly large uncertainties. At this resolution, flux distributions reflect the assumption of a constant model of the trend, and rely more heavily on the inferred autocorrelation of the flux distribution, yielding smooth spatial variability. Conversely, synthesis Bayesian inversions tend to revert to their own prior assumptions about flux variability at this scale, but this variability is prescribed a priori and is also not derived from the information provided by atmospheric data.

[60] The value in the presented approach is that it provides strongly atmospheric data-driven estimates of surface fluxes, which has several potential additional benefits. First, the flux estimates and uncertainties provide a valuable basis for comparison to estimates from other inverse modeling studies, which can help explain the influence of model assumptions on recovered fluxes. Second, by limiting the number of a priori assumptions, the geostatistical approach may highlight potential difficulties inherent to inverse modeling approaches that may otherwise go unnoticed. The observed possible land-ocean aliasing provides one example, suggesting that either this behavior had been previously undetected or that the limited atmospheric measurement network used here is not able to fully differentiate land and ocean fluxes in Temperate North America, Boreal Asia and adjacent ocean basins. In addition, results show that the limited atmospheric network does not provide independent information about ocean fluxes for large areas of the Earth, further highlighting the need for additional observations in the global oceans. Overall, the presented approach provides an ideal basis for further work toward reconciling top-down and bottom-up estimates of fluxes, because, contrary to synthesis Bayesian inversions, it yields estimates that are independent of explicit prior flux assumptions based on bottom-up estimates.


[61] The authors gratefully acknowledge Christian Rödenbeck, David Baker, Adam Hirsch, Deborah Huntzinger, Alex Martin, and the entire PUORG research group for important feedback on this manuscript, and Pieter Tans for discussions about this work. In addition, the authors thank NOAA-ESRL for providing the atmospheric CO2 concentration data used in this work, Christian Rödenbeck for providing the transport matrix and flux estimates from his 2003 study, Kevin Gurney for providing measurement residual standard deviation values for the examined measurement locations, and David Baker for the TransCom 3 Level 3 estimates. This material is based upon work supported by the National Oceanic and Atmospheric Administration under contract RA133R-05-SE-5150 “Geostatistical Analysis of NOAA Climate Monitoring and Diagnostics Laboratory Carbon Dioxide Data for 1997–2001” issued by the Climate Modeling and Diagnostics Laboratory, now part of the Earth System Research Laboratory. Additional support was provided by the National Aeronautics and Space Administration under grant NNX06AE84G “Constraining North American Fluxes of Carbon Dioxide and Inferring Their Spatiotemporal Covariances through Assimilation of Remote Sensing and Atmospheric Data in a Geostatistical Framework” issued through the ROSES A.6 North American Carbon Program.