Global monthly averaged CO2 fluxes recovered using a geostatistical inverse modeling approach: 2. Results including auxiliary environmental data



[1] Geostatistical inverse modeling has been shown to be a viable alternative to synthesis Bayesian methods for estimating global continental-scale CO2 fluxes. This study extends the geostatistical approach to take advantage of spatially and temporally varying auxiliary data sets related to CO2 flux processes, which allow the inversion to capture more grid-scale flux variability and better constrain fluxes in areas undersampled by the current atmospheric monitoring network. Auxiliary variables are selected for inclusion in the inversion using a hypothesis-based variable selection method, and are then used in conjunction with atmospheric CO2 measurements to estimate global monthly fluxes for 1997 to 2001 at a 3.75° × 5° resolution. Results show that the inversion is able to infer realistic relationships between the selected variables and flux, with leaf area index and the fraction of canopy-intercepted photosynthetically active radiation (fPAR) capturing a large portion of the biospheric signal, and gross domestic product and population densities explaining approximately three quarters of the expected fossil fuel emissions signal. The extended model is able to better constrain estimates in regions with sparse measurements, as confirmed by a reduction in the a posteriori uncertainty at the grid and aggregated continental scales, as compared to the inversion presented in the companion paper (Mueller et al., 2008).

1. Introduction

[2] Atmospheric inverse modeling is a technique using observed variability in atmospheric concentration measurements and an atmospheric transport model to infer CO2 sources and sinks at relatively large spatial scales. Given the sparsity of the current atmospheric monitoring network and the diffusive nature of atmospheric transport, however, inverse problems aimed at CO2 flux estimation are ill-posed and frequently underdetermined. To circumvent these problems, most previous inverse modeling studies have used a synthesis Bayesian inversion approach, where a priori assumptions about both the magnitude and spatial patterns of fluxes are included in the inversion. This prior information is typically derived from biospheric model output, extrapolated ocean ship-track data, and fossil fuel inventories, and is then updated using atmospheric CO2 observations [e.g., Kaminski et al., 1999; Rödenbeck et al., 2003; Gurney et al., 2004; Baker et al., 2006].

[3] Geostatistical inverse modeling differs from these previous approaches by eliminating the need for explicit prior flux estimates, thereby allowing for more strongly atmospheric data-driven estimates of global flux distributions [Michalak et al., 2004; Mueller et al., 2008]. The geostatistical approach uses a modified Bayesian setup to estimate the flux distribution as the sum of a deterministic but unknown spatial and temporal trend, and a stochastic spatially and/or temporally autocorrelated flux residual. The trend in a geostatistical framework can be as simple as global average land and ocean fluxes [Michalak et al., 2004], but could also include linear combinations of grid-scale auxiliary environmental data sets related to CO2 flux. The stochastic component of a geostatistical estimate represents features of the flux distribution that are inferred from the CO2 observations, but that cannot be explained using the covariates included in the trend.

[4] In a companion paper, Mueller et al. [2008] demonstrated the ability of the geostatistical approach to recover monthly regional (3.75° × 5°) CO2 fluxes using atmospheric concentration data from a subset of the NOAA-ESRL cooperative air sampling network [Tans and Conway, 2005]. In that application, the trend was defined as monthly varying land and ocean global average fluxes. Mueller et al. [2008] showed that the information content of available atmospheric measurements was sufficient to constrain fluxes at aggregated continental scales, particularly on land. Grid-scale estimates, however, had limited subcontinental spatial variability and high a posteriori uncertainties.

[5] In the current work, monthly CO2 fluxes and their uncertainties are estimated for 1997 to 2001 at 3.75° × 5° resolution within a geostatistical inverse modeling framework incorporating auxiliary environmental variables. Therefore the primary objective of the current paper is to investigate the additional constraint provided by these auxiliary data sets on a posteriori flux distributions and uncertainties. These data sets, which significantly explain flux variability evident from the atmospheric data, may include variables such as leaf area index and gross domestic product, which correlate well with the spatiotemporal pattern of biospheric and anthropogenic CO2 exchange, respectively. Given their global coverage, these variables also provide information about flux in regions underconstrained by the atmospheric measurements. Overall, the auxiliary variables should allow the inversion to recover more realistic CO2 flux variability with lower a posteriori uncertainties, relative to a setup relying exclusively on the limited atmospheric CO2 measurement network. The impact of these variables on a posteriori estimates and their associated uncertainties is investigated at two spatial (grid and continental) and two temporal (monthly and annual) scales, as compared to the results presented by Mueller et al. [2008].

[6] The second objective of this study is to investigate the relationships between the selected auxiliary data sets and flux as identified using the atmospheric CO2 observations, the uncertainties associated with this inferred model, and the impact of this uncertainty on the overall a posteriori uncertainty associated with the flux distribution. The relationships between each of the variables and the estimated fluxes are not prespecified in a geostatistical inversion, but rather quantified using the atmospheric observations. If the environmental data sets are relatively objective quantities with global coverage, the inclusion of auxiliary variables in the inverse model can incorporate process-based information into the final flux estimates while minimizing assumptions about the relationship between the auxiliary data and CO2 flux.

[7] Note that the presented application estimates the total CO2 flux, including the biospheric, anthropogenic and oceanic components, as was also done by Mueller et al. [2008]. This is in contrast to other inversion studies which considered fossil fuel emissions well-known and estimated only the biospheric and oceanic portions of the flux distribution [e.g., Rödenbeck et al., 2003; Baker et al., 2006]. By presubtracting a static data set of fossil fuel emissions from the observational data, previous inversion studies aliased any spatial and temporal uncertainty in the fossil fuel flux distribution onto the biospheric fluxes or nearby ocean regions. Given that fossil fuel emissions, at least in the Northern Hemisphere, are known to vary seasonally, presubtracting assumed annual fossil fuel emissions can confound the interpretation of a posteriori fluxes [Gurney et al., 2005].

[8] The paper is organized as follows. Section 2 presents an overview of the geostatistical inverse modeling method, with an emphasis on the approach used for incorporating auxiliary environmental data into the estimation. Section 3 presents the results of the analysis, including the selected auxiliary variables and their impact on flux estimates. Section 4 summarizes the main conclusions of the study.

2. Methods

[9] The surface flux estimates presented in this paper are obtained using a geostatistical inverse modeling approach, a full description of which is provided in the companion paper [Mueller et al., 2008]. This section presents a summary of the method, as well as a description of extensions developed and implemented in the current work. A diagram of the overall algorithm is presented in Figure 1.

Figure 1.

Schematic of geostatistical inversion components and algorithm, which are identical to those presented by Mueller et al. [2008] with the exception of the variable selection step. White boxes indicate inversion inputs, light gray boxes indicate inversion steps, and dark gray boxes represent inversion outputs. Grey circles indicate the sequence of steps in the algorithm.

2.1. Geostatistical Inverse Modeling Objective Function

[10] Geostatistical inverse modeling is a Bayesian approach that does not rely on prior estimates of the magnitude and spatial distribution of surface fluxes. The approach models the flux distribution as the sum of a deterministic but unknown component, Xβ, referred to as the model of the trend, and a zero-mean stochastic component with a spatial and/or temporal autocorrelation described by the covariance matrix Q. The model of the trend defines the portion of the flux signal that can be explained by a set of covariates included in the matrix X. This spatiotemporal trend can be as simple as a constant mean, but can also include linear relationships with any number of auxiliary variables related to flux. In the discussion that follows, m represents the number of estimated fluxes, n is the number of atmospheric concentration measurements, and p is the number of components within the model of the trend.

[11] The objective function Ls,β for a geostatistical inversion is defined as

equation image

where z is an n × 1 vector of atmospheric concentration measurements, H is an n × m matrix defining the sensitivity of each available measurement to each estimated flux, X (m × p) is a prespecified matrix defining the structure of the model of the trend, and β (p × 1) are the estimated coefficients relating the components in X to the estimated fluxes s (m × 1). Q is an m × m matrix representing the a priori spatiotemporal covariance of flux deviations from Xβ, and R is an n × n diagonal matrix representing the variance of measurement, transport and representation errors for each observation. Further descriptions of each of these components are presented in the following sections and in the companion paper [Mueller et al., 2008].

2.2. Observational Data (z) and Transport Model (H)

[12] Monthly averaged atmospheric CO2 flask measurements (z) from 44 unevenly spaced global measurement locations within the NOAA-ESRL cooperative air sampling network [Tans and Conway, 2005] are used to constrain the global flux distribution, together with a transport matrix, H, describing the sensitivity of measured concentrations to estimated fluxes. These components of the inversion are identical to those presented in the companion paper [Mueller et al., 2008]. The observational subset in z is similar to that used by Rödenbeck et al. [2003], and the number of measurements in any given month ranges from 35 to 42 between 1997 and 2001. The H matrix was derived from an adjoint implementation of the atmospheric transport model TM3 [Heimann and Körner, 2003], which has a spatial resolution of 3.75° latitude by 5° longitude with 19 vertical levels, and is driven by interannually varying winds from the NCEP Reanalysis [Kalnay et al., 1996].

2.3. Model of the Trend (Xβ)

2.3.1. Structure of the Model of the Trend

[13] The X matrix (m × p) defines the structure of the model of the trend, and includes values for a selected subset of environmental variables that covary with flux. Each of the p covariates is defined at the time and location of each of the m estimated fluxes. The β vector (p × 1) of coefficients, estimated as part of the inversion, corresponds to the variables in X and represents the linear relationships between the variables and CO2 flux, as seen through the atmospheric data. The overall trend Xβ is conceptually similar to a multivariate linear regression where the components in X are predictor variables that explain some portion of the flux variability, and β are the coefficients on these variables. However, unlike multivariate linear regression, the relationships are estimated in an inverse modeling framework (using concentration measurements to infer the covariates of flux), and the approach does not assume independent residuals. In order to be consistent with terminology commonly used in statistics, the β values in this study will henceforth be referred to as drift coefficients.

[14] The simple model of the trend implemented by Mueller et al. [2008] includes estimated average fluxes for each calendar month over land and ocean, and thereby captures both seasonal variability and differences in the expected flux magnitude over these separate spatial domains. The model of the trend presented in the current study replaces these monthly average land fluxes with a subset of spatially and temporally varying auxiliary environmental variables, selected using the procedure presented in section 2.3.3. In addition, a monthly varying terrestrial latitudinal gradient, expressed as sin (2 × latitude), is included to represent the expected opposing sources and sinks in the Northern and Southern Hemispheres. The strength and direction of this gradient is allowed to vary monthly, in order to reflect the seasonality in the two hemispheres. A monthly varying spatially constant mean is also assumed for ocean fluxes, similarly to the Mueller et al. [2008] study.

[15] Overall, the structure of the trend is represented by an (m × p) matrix X, where p = 24 + k. The first 24 columns contain the monthly terrestrial latitudinal flux gradients and ocean constants, and the subsequent k columns contain the auxiliary variables for each month and location, i.e.,

equation image

where bi (m × 1) includes values of the ith auxiliary variable, and Aj (m × 2) contains nonzero entries only for fluxes within a single calendar month j. For a given month, the relevant portion of Aj, defined as the 3456 × 2 matrix aj, contains values of sin (2 × latitude) for land grid cells in the first column, and ones for ocean grid cells in the second column,

equation image

2.3.2. Auxiliary Environmental Variables

[16] The goal of incorporating auxiliary variables associated with carbon cycle processes into the model of the trend is to better represent the expected spatial and temporal variability of a posteriori grid-scale flux estimates, while only including variables that provide significant information as seen through the atmospheric monitoring network. A preliminary set of auxiliary variables with global coverage for the study period was selected on the basis of known associations with biospheric or fossil fuel fluxes. In contrast, few oceanic variables with complete spatial and temporal coverage are available for 1997 to 2001. Available oceanic variables, such as sea surface temperature, were initially considered but eliminated from further consideration given preliminary results showing that the atmospheric data were not able to infer physically reasonable relationships to ocean flux. As more oceanic data sets with gridded, global coverage become available, especially from the MODIS (Moderate Resolution Imaging Spectroradiometer) instruments on the Terra and Aqua satellites, future geostatistical inversion studies may be able to use of this information to better explain oceanic flux variability.

[17] The auxiliary variables considered in this study are presented in the first column of Table 1, and described below. All variables were regridded from their native resolutions to the 3.75° × 5° resolution of this study using area-weighted averaging.

Table 1. Auxiliary Variables and Their Observed Significance Levels for Each Round of the Variance Ratio Testa
VariableRound 1Round 2Round 3Round 4Round 5Round 6
  • a

    Variables included in the model of the trend are bold.

GDP density<1016     
Population density<10−163 × 10−83 × 10−102 × 10−83 × 105 
LAI0.373 × 103    
fPAR7 × 10−60.39<1016   
NDVI4 × 10−40.80<10−160.310.040.41
Shortwave radiation0.030.295 × 10−50.640.010.12
Surface air temperature10−40.024 × 10−62 × 10−33 × 10−30.06
Precipitation6 × 10−97 × 10−41 × 10−113 × 10−30.020.25
Percent agricultural land<10−1610−76 × 10−143 × 10−40.060.81
Percent forest cover10−83 × 10−5<10−160.030.970.51
Percent forest/shrub cover0.870.590.043 × 10−70.970.51
Percent grassland0.120.300.810.010.220.38
Percent shrub cover3 × 10−92 × 10−76 × 10−62 × 1011 Downwelling Shortwave Radiation

[18] Downwelling shortwave radiation is approximately proportional to the amount of photosynthetically active radiation (PAR), which drives photosynthesis. Downwelling shortwave radiation data over land were obtained for 1997–2001 from the National Centers for Environmental Prediction (NCEP) reanalysis [Kalnay et al., 1996]. Surface Air Temperature

[19] Surface air temperature is positively correlated with PAR (and hence photosynthesis), as well as with the rates of all metabolic reactions including respiration. Surface air temperature data were obtained from the NCEP/NCAR Reanalysis Monthly Means [Kalnay et al., 1996]. Precipitation

[20] Precipitation affects water availability, and thereby affects both plant growth and soil respiration. The absence of precipitation, or drought, can limit CO2 uptake and also promote forest fires. A precipitation data set was obtained from the Monitoring Product of the Global Historical Climatology Centre in Germany [Adler et al., 2003]. Palmer Drought Severity Index (PDSI)

[21] The PDSI tracks atmospheric moisture at the surface of the earth relative to local mean conditions, and is calculated using both precipitation and surface air temperature. The Palmer Drought Severity Index (PDSI) was formulated by Palmer [1965] as a hydrological accounting system for the central United States, and was subsequently extended globally by Dai et al. [2004]. Vegetation Indices: LAI, NDVI, fPAR

[22] The Normalized Difference Vegetation Index (NDVI) is the dimensionless normalized difference between solar and infrared surface reflectances. Because leaves absorb solar but reflect infrared radiation, NDVI is a measure of green leafy biomass. Leaf area index (LAI) is the total surface area of leaves per unit ground area (m2/m2). The absorbed fraction of photosynthetically active radiation (fPAR) is the fraction of incident solar radiation absorbed by plants during photosynthesis. NDVI was sourced from the GIMMS data set, version g [Tucker et al., 2005] based on radiances from the Advanced Very High Resolution Radiometer (AVHRR). fPAR was estimated from the NDVI data using the average of the simple ratio and NDVI methods [Los et al., 2001; Schaefer et al., 2002, 2005], and LAI was estimated from fPAR by inverting Beer's law assuming leaf radiative characteristics from Sellers et al. [1996]. Land Cover

[23] Different land cover types are associated with varying levels of net primary productivity (NPP). The DISCover Global Land Cover data set, obtained from the Global International Geosphere-Biosphere Program [Loveland et al., 2001], contains 18 categories of land cover derived from satellite imagery recorded from April 1992 through March 1993. This data set was further binned into six categories: Forest, Shrub, Grassland, Agriculture, Barren (including Urban) and Inland Water, and a percent cover for each of these six land cover categories was calculated at the 3.75° × 5° resolution. Only Agricultural Land, Forest Cover, Shrub Cover, Grassland and a combined Forest/Shrub Cover category were selected for further assessment. These derived land cover variables form a static data set used for the full study period. Population Density

[24] Fossil fuel emissions generally trend well with human population density, although some densely populated countries (e.g., Bangladesh, which is 9th in the world in population, but 69th in emissions [Marland et al., 2008; Central Intelligence Agency, 2007]) weaken this relationship. The population density data set used in this study was created by Environment Canada with support from the United Nations Environment Programme [Li, 1996]. GDP Density

[25] A global gridded gross domestic product (GDP) data set, representing the total economic output of the population living in a given area, was sourced from the International Satellite Land Surface Climatology Project Initiative II Data Collection [Yetman et al., 2004].

[26] The population and GDP data sets are static snapshots of the year 1990, and both are normalized by grid cell area to create a density indicator.

2.3.3. Variable Selection Using the Variance-Ratio Test

[27] The Variance-Ratio test [Kitanidis, 1997] is a hypothesis-based variable selection method that was originally developed to justify the inclusion of a more complex trend in geostatistical interpolation. A modified method, compatible with an inverse modeling setup, is presented and implemented here. In a geostatistical inversion, improving the model of the trend's ability to represent CO2 flux variability can increase the accuracy of, and reduce the a posteriori uncertainty associated with, the recovered flux distribution. However, adding auxiliary variables with only a spurious correlation to flux can bias the model, and yield unreasonable estimates in poorly constrained areas. The Variance-Ratio test is designed to balance the risks of including too few versus too many variables, by quantifying the significance of the improvement in model fit resulting from the addition of one or more variables to the model of the trend.

[28] In this approach, the weighted sum of squares (WSS) of the orthonormal residuals is defined for an initial (X0, m × p) and an augmented (X1, m × (p + q)) model of the trend (where X0 is a subset of X1) as

equation image


equation image

WSS is a measure of fit that assesses how well the two trends, X0 and X1, explain the variability in fluxes as seen through the atmospheric concentration measurements, z, and weighted by the appropriate covariance matrices (R and Q). The WSS equation, as presented here, accounts for the spatial correlation of the residuals in order to create a test analogous to model selection for multivariate linear regression. The WSS equation was also modified for an inversion setup from that presented by Kitanidis [1997].

[29] A trend with more auxiliary variables will always be able to represent more of the inferred variability relative to a simpler model. Therefore, WSS1 is always less than or equal to WSS0, given that X1 includes all the variables in X0, as well as one or more additional variables. The significance of the improvement in model fit is evaluated using the normalized relative difference between WSS0 and WSS1,

equation image

and the level of significance is quantified using an F distribution with q and n-p-q degrees of freedom (where n represents the number of available measurements, p the number of components in X0, and q the number of additional components in X1 relative to X0).

[30] For this application, a trend with 12 monthly latitudinal land gradients and 12 monthly ocean constants is set as the initial model X0. The Variance-Ratio test is then run for each of the 14 candidate auxiliary variables (Table 1), adding each individually into X1 (i.e., q = 1). A single variable that significantly improves the trend is selected for inclusion, and this augmented trend becomes the new X0. The test is then performed again using each of the other 13 remaining variables. Multiple rounds of the test are performed until no significant variables remain at the α = 0.05 significance level. Only a single variable is added in each round, even if more than one variable represents a significant improvement to the model. The choice among significant variables is based on its relative level of significance, as well as the importance of its known association with key flux drivers (i.e., photosynthesis, respiration, fossil fuel emissions, etc.).

2.4. Covariance Matrices (Q and R)

[31] The covariance matrix Q represents the spatial autocorrelation of flux residuals from the trend, and therefore the magnitude of this correlation depends on the degree to which the model of the trend (Xβ) can represent the flux variability inferred using available observations. In an idealized case where the model of the trend captures all processes underlying the inferred spatial variability, the flux residuals would be uncorrelated random noise and Q would become a diagonal matrix. In practice, however, flux residuals are almost always correlated, and the goal of improving the trend, as discussed in section 2.3, is to decrease the magnitude of the residuals.

[32] As in the Mueller et al. [2008] study, the Q matrix is modeled using an exponentially decaying spatial correlation among flux residuals from the trend

equation image

where hij is the separation distance between two estimation locations. The practical correlation length is approximately 3l, beyond which σ2 represents the expected variance of independent flux residuals. Parameters for land and ocean fluxes are optimized separately, and no correlation is assumed between them, as described by Mueller et al. [2008].

[33] The model-data mismatch variances in the R matrix, which are assumed uncorrelated, include measurement, transport, and representation errors for each observation. These variances are assumed to be proportional to the square of the residual standard deviation (RSD) of flask observations from a smoothed curve [GLOBALVIEW-CO2, 2008], with the RSDs scaled by the proportion of real data in the record for each station [Gurney et al., 2003].

[34] The parameters of the Q and R matrices are optimized using the Restricted Maximum Likelihood (RML) method [Kitanidis, 1995; Michalak et al., 2004; Mueller et al., 2008], a quantitative approach that helps to reduce biases in the flux estimates associated with errors in the covariance matrices. The covariance parameters for the Q matrix are optimized using process-based and inventory flux estimates from the Carnegie-Ames-Stanford Approach (CASA) model [Randerson et al., 1997] for monthly net ecosystem production (NEP), Takahashi et al. [2002] for monthly oceanic net carbon exchange, and Brenkert [1998] for yearly averaged fossil fuel and cement production emissions. The scaling parameter, c, applied to the squared RSDs in the R matrix, is optimized using the atmospheric concentration measurements. The RML method is implemented for both the R and Q matrices using the model of the trend (X) derived from the variable selection process described in section 2.3.3.

2.5. Geostatistical Inversion System of Equations

[35] By minimizing the objective function defined in equation 1 with respect to s and β, the inversion simultaneously minimizes differences between the estimated fluxes (s) and the model of the trend (Xβ), and the residuals between actual atmospheric CO2 measurements (z) and concentrations derived from the estimated fluxes (Hs). The R and Q covariance matrices control the balance between achieving these two objectives. For example, low variances in the model-data mismatch covariance matrix (R) drive the inversion to reproduce the measurement data at the expense of keeping flux estimates close to the model of the trend. Also, in areas sensitive to the available measurements, as described through the matrix H, the inversion relies more heavily on reproducing observations, whereas in areas lacking measurements, the inversion reverts more strongly to the model of the trend (Xβ) and the spatial correlation of flux residuals (Q).

[36] Minimizing equation (1) with respect to fluxes, s, and drift coefficients, β, yields the following system of linear equations:

equation image

The weights Λ (m × n) and Lagrange multipliers M (p × m) are used to define the estimated fluxes (equation image) and their posterior covariance (Vequation image) as

equation image
equation image

Estimates of the drift coefficients, equation image, and their uncertainty covariance, Vequation image, are calculated as

equation image
equation image

where the diagonal elements of Vequation image represent the uncertainties of the drift coefficients, and the off-diagonal terms represent their error covariances.

[37] The estimated fluxes (equation image) can also be expressed in a form more similar to that used in synthesis Bayesian inversions, as the sum of a deterministic component (Xequation image), i.e., the estimated model of the trend of the flux distribution, and a stochastic component that is a function of the a priori correlation structure in Q,

equation image

3. Results and Discussion

[38] This section presents CO2 fluxes estimated using a geostatistical inversion, which are informed both by atmospheric CO2 measurements and selected auxiliary environmental data. Results are also compared to those obtained by Mueller et al. [2008] using only the atmospheric data constraint.

3.1. Variance-Ratio Test and Selection of Auxiliary Variables

[39] The Variance-Ratio test is applied as described in section 2.3.3 to select a subset of auxiliary variables that best represent flux variability, as inferred using the atmospheric CO2 observations. As previously mentioned, the approach is complemented with scientific understanding regarding the variables and their relationship to flux processes to select among variables that are significant in each round of the test. Fully automatic model-building procedures are not recommended as a means for identifying the best interpretable model, because such procedures can potentially select models that represent only spurious relationships, and can fail when applied to comparable data sets [Judd and McClelland, 1989]. Note that the Variance-Ratio test determines the significance of the linear relationship between surface flux and auxiliary variables as identified through the relatively sparse atmospheric measurement network. Therefore, selected variables may be more representative of relationships in well-constrained regions.

[40] GDP Density is selected in the first round of auxiliary variable selection (Table 1) because it significantly improves the trend, and is believed to best isolate the fossil fuel emission signal, which is the largest single net source of CO2 on annual timescales. LAI is selected in the second round for its association with NPP, and because it is the most significant among the three vegetation indices. For all subsequent rounds, the most significant variable is selected for inclusion in the augmented model of the trend. These variables are fPAR, % Shrub Cover, and Population Density, in the third, fourth, and fifth rounds, respectively. No additional variables are significant beyond the fifth round. Results from the Variance-Ratio test also confirm that the monthly latitudinal gradients are a significant improvement upon the monthly land constants implemented by Mueller et al. [2008], a result which holds regardless of whether or not auxiliary variables are also included in the analysis.

[41] Overall, the selected variables are associated with different drivers of terrestrial CO2 flux, including photosynthesis, respiration, land cover type, and fossil fuel emissions. Additional auxiliary variables and/or functional forms could be applied in the future in order to capture additional processes (e.g., biomass burning, deforestation and oceanic productivity/gas exchange) and identify more complex or regional relationships between auxiliary variables and CO2 flux variability. However, given that geostatistical inversions estimate both the model of the trend and flux deviations from this trend, any processes that are not represented by the auxiliary variables are still represented in the final best estimates of flux as part of the stochastic component of the best estimate.

3.2. Optimized Covariance Parameters

[42] The optimized parameters for the covariance matrices (Q and R) are presented in Table 2 for the model of the trend presented in the last section, as well as the simple trend implemented by Mueller et al. [2008]. Both of the land Q parameters (σQ2 and lQ) show a significant decrease of approximately 30% from the simple to the complex trend. The optimized scaling parameter (c) for R decreases by 8%, a smaller but also significant change. Given the absence of any oceanic variables in the complex trend, the ocean Q parameters remain unchanged between the two trends.

Table 2. Optimized Model-Data Mismatch and Spatial Covariance Parameters With ±1 Standard Deviation for Simple and Complex Models of the Trenda
TrendQlandQoceanR, c
σ2 (μmolCO2/(m2s))2l (km)σ2 (μmolCO2/(m2s))2l (km)
  • a

    Model-data mismatch, R; and spatial covariance, Q. Simple model from Mueller et al. [2008].

  • b

    Two standard deviation reduction from simple to complex trend.

  • c

    One standard deviation reduction from simple to complex trend.

Simple0.40 ± 0.032700 ± 2000.0030 ± 0.00035700 ± 5000.63 ± 0.04
Complex0.28 ± 0.01b1800 ± 100b0.0030 ± 0.00035700 ± 5000.58 ± 0.04c

[43] The reduction in the model-data mismatch parameter (c) and the land Q variance parameter (σQ2) provide additional confirmation that the complex trend is better able to represent the spatial variability of CO2 flux relative to the simple trend. The reduction in the estimated model-data mismatch demonstrates that fluxes estimated using the complex trend are better able to reproduce the atmospheric concentration measurements relative to those derived using the simple trend. The decrease in σQ2 indicates that, as more of the flux variability is explained by an improved trend, the flux residuals decrease in magnitude. In other words, the complex trend explains a larger fraction of the variability of CO2 fluxes. Shorter correlation lengths in the residuals also indicate that more of the large-scale spatial variability is being captured by the complex model of the trend, leading to residuals that are correlated on shorter spatial scales. As will be discussed in section 3.5, the changes in the Q and R parameters also lead to a decrease in grid-scale a posteriori uncertainties for the best estimates of flux.

3.3. Estimated Drift Coefficients (equation image) and Contributions to Flux (equation image)

[44] The estimated drift coefficients (equation image) corresponding to the auxiliary variables, their coefficients of variation (σequation image), and the correlation coefficients (ρ) among them are presented in Table 3. A positive sign on the drift coefficients indicates a positive correlation with CO2 flux (i.e., a source or reduction in sink), while a negative sign indicates a negative correlation (i.e., a sink or reduction in source). A coefficient of variation less than 0.5 implies a significant contribution to the trend at the 2σequation image level, and all drift coefficients on the auxiliary variables are therefore significant at the 95% level. The recovered signs on the drift coefficients for the five auxiliary variables show that the inversion is able to infer reasonable relationships between these parameters and CO2 flux. GDP and Population Densities are associated with sources, as expected given their correlation with fossil fuel emissions, while the opposite signs on LAI and fPAR imply that these variables collectively represent the opposing photosynthesis and respiration signals. These results lend support to the validity of the Variance-Ratio test for selecting auxiliary variables, as well as provide indirect evidence that the improved model of the trend is able to correctly represent flux variability in the final flux estimates, particularly in underconstrained regions.

Table 3. Estimated Drift Coefficients, Coefficients of Variation, Annual Average Global Contribution to Flux, and Correlation Coefficients Between Auxiliary Variables in the Model of the Trenda
 equation imagebσequation imageXequation image (GtC/a)ρ (GDP)ρ (Pop)ρ (LAI)ρ (fPAR)ρ (Shrub)
  • a

    Drift coefficients, equation image; coefficients of variation, σequation image; annual average global contribution to flux, Xequation image; and correlation coefficients, ρ. Also shown is the range of monthly values for the individual equation image and σequation image for the land latitudinal gradients and ocean averages, as well as their annual average global contribution to flux. The annual average contribution to flux of the complete trend represents a sum of the contributions by each of the previous components.

  • b

    The drift coefficients (equation image) have units of μmolCO2/(m2s) divided by the units of the individual auxiliary variables. Owing to differences in units on the auxiliary variables, the magnitudes of the drift coefficients are not directly comparable.

GDP density (thousands/(m2yr))1800.351.61.00
Population density (people/m2)17000.263.2−0.421.00
LAI (m2/m2)−0.490.08−
Shrub Cover (%)−0.00380.18−−0.211.00
Land latitudinal gradients−0.5 to 0.30.2 to 1.10.6     
Ocean constants−0.08 to 0.010.2 to 6.7−2.8     
Complete trend  3.8     

[45] The annually averaged global contribution to flux (Xiequation imagei) in GtC/a is also displayed in Table 3 for each of the auxiliary variables, which makes it possible to assess the magnitudes of the recovered drift coefficients in consistent units. GDP and Population Densities together contribute a source of 4.8 GtC/a globally, which is approximately 70% of the estimated 6.7 GtC/a global source from fossil fuels and cement production over this period [Marland et al., 2008].

[46] LAI and fPAR have the largest annually averaged contributions to flux among the different components of the trend. These data sets have similar spatial patterns, and the collinearity between them, as demonstrated by the strong anticorrelation between their estimated drift coefficients (ρ = −0.94), implies that the interpretation of their combined contribution to flux is more reliable than that of their individual contributions. Specifically, the combined contribution of LAI and fPAR within the trend shows net sources and sinks on a seasonal basis that are consistent with the expected biospheric signal, while this contribution also plays a large role in defining the spatial variability of the overall terrestrial flux estimates, as shown in Figure 2. The combined annually averaged global contribution to flux of LAI, fPAR and % Shrub Cover is a source of 1.2 GtC/a, implying that these variables together represent a large portion of the biospheric signal, which has a strong seasonality but a relatively small annually averaged net flux.

Figure 2.

(a) Contribution to flux estimates by LAI and fPAR within the model of the trend (Xequation image) for May 2000, (b) contribution by LAI and fPAR for July 2000, (c) best estimates of flux (equation image) for May 2000, and (d) best estimates of flux (equation image) for July 2000.

[47] The positive drift coefficient associated with fPAR (representing sources or reductions in sinks) and the negative drift coefficient associated with LAI (representing the opposite) appear to contradict process-based understanding of the relationship between these variables and biospheric CO2 fluxes. Photosynthesis is frequently estimated from fPAR, given assumed rates of autotrophic respiration [Tucker and Sellers, 1986; Potter et al., 1993], while LAI, as a measure of biomass, is more commonly associated with autotrophic and heterotrophic respiration [e.g., Reichstein et al., 2003]. However, at the spatial and temporal resolution of this study, LAI appears to capture the strong seasonality expected for photosynthesis, while fPAR, with a weaker seasonal cycle, captures variability expected for total ecosystem respiration (Figure 3).

Figure 3.

Average monthly LAI and fPAR (from 1997 to 2001) for the combined Northern Hemisphere land regions of Boreal Asia, Europe, and Boreal North America (as defined in Figure 8).

[48] Figure 4 shows the contribution to the trend (Xiequation imagei) by the monthly terrestrial latitudinal gradients, which show strong seasonality. For example, the latitudinal gradient in June shows a sink in the Northern Hemisphere midlatitudes with a corresponding source in the Southern Hemisphere, while the gradient shows the opposite flux pattern in January. This result demonstrates that the atmospheric data are able to correctly identify seasonal variability between the hemispheres that is unexplained by the other auxiliary variables within the trend. Eight of the twelve multipliers show a source in the Northern Hemisphere, likely as a result of the year-round fossil fuel sources from industrialized areas in North America, Europe and Asia that are not completely captured by the contributions of GDP density and population density within the model of the trend.

Figure 4.

Contribution to flux by the 12 monthly latitudinal land gradients within the model of the trend (Xequation image).

[49] The complete model of the trend including the latitudinal gradients, ocean constants and auxiliary variables, represents a 3.8 GtC/a annually averaged source to the atmosphere from 1997 to 2001. The overall annually averaged global flux estimate from the inversion is a source of 4.0 GtC/a, which indicates that the complex model of the trend captures approximately 95% of the global atmospheric increase on an annually averaged basis, and is therefore explaining a substantial portion of total flux at this aggregated scale. As shown in equation (13), the residual component of the flux estimates are explained by the stochastic component, such that the additional source apparent in the atmospheric measurements but not captured by the trend is still incorporated into the final flux estimates.

3.4. Spatial Distribution of the Deterministic and Stochastic Components of the a Posteriori Flux Estimates (equation image)

[50] Figure 5 illustrates the spatial distribution of each component of the model of the trend (Xequation image), the spatially correlated flux residuals (QHTΨ−1(zHXequation image)), and the best estimates of flux (equation image) for July 2000. The shrublands in arid regions like central Australia and the boreal regions of North America and Asia show small negative contributions to the overall flux, while LAI and fPAR show large, but opposite, contributions to flux in vegetated areas, as previously discussed in section 3.3. Both GDP and population densities show positive contributions to flux, although their spatial patterns differ. The terrestrial latitudinal flux gradient reflects climatic variability unexplained by the other auxiliary variables, and shows the largest negative contribution to flux in the Northern Hemisphere midlatitudes for this month. It should be noted that % Shrub Cover, GDP density and population density are static data sets and therefore, the July 2000 contributions of these variables shown in Figure 5 represent only long-term average contributions to flux.

Figure 5.

Contribution of various components within the model of the trend (Xequation image) toward the best estimates of flux equation image in July 2000: (a) GDP density, (b) population density, (c) LAI, (d) fPAR, (e) % Shrub Cover, (f) latitudinal gradient and ocean constant, (g) stochastic component of best estimates, and (h) full best estimates equation image.

[51] While the magnitude of the stochastic component is generally reduced as the ability of the trend to explain flux variability becomes stronger (as evidenced by the reduction in the σQ2 land parameter shown in Table 2), the stochastic component associated with the flux estimates in July 2000 is still responsible for positive contributions over South America and most of North America, and slight negative contributions in northeast Asia, Australia and parts of Africa. In fact, the stochastic component adds a positive contribution to flux in tropical Central and South America for approximately eight months of each year of the inversion. This shows that although the complex model of the trend cannot capture a systematic flux signal in this region, possibly owing to the lack of auxiliary variables associated with biomass burning and/or deforestation, the stochastic component identifies a net additional source in these regions.

[52] In the Mueller et al. [2008] study, Xequation image is simply an average flux over land and an average flux over oceans for each calendar month. Therefore, the spatial variability of the best estimates at the grid scale is entirely determined by the spatially correlated stochastic component. In contrast, for the complex trend with auxiliary variables, each component within the trend adds an additional layer of spatial variability to the a posteriori flux estimates, weighted by that component's estimated relationship to flux (equation image). Therefore, the complex trend inversion is able to more realistically represent grid-scale variability without relying on the use of explicit prior flux estimates used in synthesis Bayesian inversions.

3.5. A Posteriori Grid-Scale Uncertainty Reduction From Simple to Complex Trend

[53] The greater ability of the complex trend to capture flux variability relative to the simple trend, implemented by Mueller et al. [2008], which can be seen in the reduction in the optimized land variance parameter in the Q matrix and the scaling parameter in the R matrix, leads to an overall decrease in a posteriori uncertainty on the flux estimates (see equation (10)). Figure 6 shows the average percent change in uncertainty at the grid scale between the simple and the complex trend inversion for the year 2000. The uncertainty on land is reduced by up to 14%, with higher decreases in underconstrained areas such as Africa, South America and Southeast Asia, which are now informed by a better deterministic model of the trend. For the oceans, the uncertainty is reduced by approximately 2% for most regions. Whereas the reduction in the variances in Q and R leads to a direct decrease in the a posteriori uncertainties, including additional variables in the model of the trend also leads to additional uncertainties resulting from the estimation of the corresponding drift coefficients (equation image). These uncertainties contribute to the final a posteriori uncertainties through the term −XM in equation (10), which is always positive. Therefore, some regions actually show a slight increase in a posteriori uncertainty when moving from the simple to the complex trend. For example, the high values of GDP and Population Densities in central Europe, China and Bangladesh, lead to increases in estimated uncertainty of up to 11%.

Figure 6.

Percent change in a posteriori uncertainty (σequation image) from the simple to the complex trend inversion, annually averaged for year 2000. Triangles represent measurement locations.

[54] The general reduction in a posteriori grid-scale uncertainty (σequation image) from the simple to the complex trend shown in Figure 6 leads to a small increase in the number of significant terrestrial sources and sinks estimated at the grid scale (17% of grid cells for the simple trend versus 25% for the complex trend at the 1σequation image level, or 2% versus 6% at the 2σequation image level). Overall, however, grid-scale uncertainties are high relative to flux magnitudes in both inversions owing to the limited network of atmospheric measurements, as expected. Note that the reduction in uncertainty from the simple to the complex trend described here is not analogous to the reduction in uncertainty described in synthesis Bayesian inversion studies [e.g., Rödenbeck et al., 2003; Baker et al., 2006]. In synthesis Bayesian inversions, the a priori uncertainty is described by the matrix Q, whereas the a priori uncertainty in geostatistical inversions is effectively infinite given that there are no a priori assumptions about the drift coefficients β. Instead, the reduction in uncertainty reported here represents the relative constraints on fluxes achieved by two different inversion setups, namely those described by the simple and complex trends.

3.6. Continental-Scale Seasonal Cycle for Year 2000

[55] Figure 7 presents monthly flux estimates and 1σequation image confidence intervals for the year 2000 resulting from the simple and complex trend inversions, aggregated to the 22 TransCom regions [e.g., Gurney et al., 2003] shown in Figure 8. In some regions, such as Boreal North America, Temperate North America and Northern Africa, results from the application of the two trends are nearly identical. In other regions, the auxiliary variables and terrestrial latitudinal gradients in the complex trend have an impact on the flux estimates. For example, the complex trend inversion shows a larger summertime sink in Boreal Asia and Europe and a slightly higher year-round flux in Tropical Asia, with this latter result most likely due to the positive contribution to flux associated with densely populated areas in Bangladesh and southern China. The better constraint on terrestrial fluxes provided by the improved trend also slightly alters fluxes in nearby ocean regions.

Figure 7.

Monthly best estimates (equation image) aggregated to 22 TransCom regions with 1σequation image confidence intervals for year 2000 for simple [Mueller et al., 2008] and complex trend inversions.

Figure 8.

Locations of 11 land and 11 ocean TransCom regions [e.g., Gurney et al., 2003].

[56] However, apart from these small differences, the magnitude and seasonality of aggregated fluxes inferred using the two trends agree well for both land and ocean regions. This result shows that there exists a relatively strong atmospheric constraint on the seasonal cycle of geostatistical flux estimates at the scale of the 22 TransCom regions, particularly important given that flux patterns at the grid-scale vary significantly between the two inversions. This result also supports the hypothesis that the flux estimates at the aggregated scale are representative of the information content of the atmospheric data.

3.7. Annually Averaged Aggregated Sources and Sinks

[57] Figure 9 presents annually averaged fluxes for 1997 to 2001 from the simple and complex trend inversions, aggregated to the 22 TransCom regions. Uncertainty associated with the annually averaged fluxes is 7% to 19% lower for land regions and 2% to 7% lower for ocean regions in the complex trend inversion relative to the simple trend inversion, demonstrating that the improved trend helps to better constrain flux estimates at aggregated spatial and temporal scales, as well as at the grid scale (as discussed in section 3.5).

Figure 9.

Annually averaged flux for simple and complex trend inversions for TransCom (a) land and (b) ocean regions for 1997 to 2001. Land fluxes include both biospheric and fossil fuel components. Error bars represent 1σequation image and 2σequation image confidence intervals.

[58] For the complex trend, most land regions show significant (1σequation image) net sources, whereas Boreal North America and Boreal Asia are flux-neutral, and Australia is a significant sink. The predominance of continental-scale terrestrial sources reflects the impact of fossil fuel emissions on the annually averaged CO2 fluxes. An analysis of the biospheric annually averaged flux, derived by subtracting fossil fuel inventory data [Brenkert, 1998] from the annual total values shown in Figure 9, shows that Temperate North America, Europe, Temperate Asia and Australia all act as significant biospheric sinks (1σequation image) in the complex trend inversion.

[59] For all ocean regions, fluxes from both inversions show a significant (1σequation image) sink, and the results from the two inversions are not significantly different from one another. However, as discussed by Mueller et al. [2008], the relatively constant oceanic flux estimates across regions reflect the limited information content of the atmospheric measurements, with oceanic flux estimates in many regions remaining close to the global average in the model of the trend. Despite the lack of oceanic auxiliary variables, a better constraint on terrestrial fluxes within the complex trend reduces the strength of the overall ocean sink (from −3.0 to −2.7 GtC/a), bringing these estimates into closer agreement with independent results from extrapolated ocean ship-track data [Takahashi et al., 2002] and inverse modeling studies that make direct use of these data [Rödenbeck et al., 2003; Baker et al., 2006].

[60] A few underconstrained land regions, such as Tropical Asia, Tropical America and Australia, show significant (1σequation image) changes in estimated average flux between the two inversions. The significant increase in Tropical Asia and decrease in Tropical America demonstrate that the addition of auxiliary information with global coverage helps to constrain regions remote from measurement locations (see map in Figure 6), especially given that the estimates obtained using the complex trend are closer to “bottom-up” estimates for these regions. For example, CASA estimates of net ecosystem exchange (NEE) [Randerson et al., 1997] with regional corrections for deforestation and regrowth, as applied by Baker et al. [2006], and fossil fuel emission estimates from Brenkert [1998] yield a 0.7 GtC/a source for Tropical America and a 1.4 GtC/a source for Tropical Asia, which are similar to the independent estimates obtained using the complex trend inversion. The significant decrease in the net flux from Australia, however, is not consistent with estimates from previous inverse modeling studies [Rödenbeck et al., 2003; Baker et al., 2006] and bottom-up models, which show a near-neutral biospheric flux. The stronger estimated sink in Australia is likely caused by the negative drift coefficient on % Shrub Cover in the complex model of the trend, together with the large areas of open shrublands in this region. Given that this drift coefficient represents a globally averaged estimated relationship between % Shrub Cover and CO2 flux, estimates in Australia may be unduly influenced by the relationship between shrublands and flux in the better-constrained boreal regions.

[61] The main conclusion to be drawn from the comparison between the annually averaged, continental-scale fluxes for the two trends is that, as with the seasonal cycle of continental-scale fluxes, there is a relatively strong atmospheric constraint on fluxes at this aggregated spatial scale. However, when aggregating in time, auxiliary variables can significantly impact the flux estimates for certain underconstrained regions in a manner consistent with process-based understanding of CO2 flux, where this improvement is contingent on the validity of assuming a global relationship between auxiliary variables and CO2 flux. Overall, as evidenced by lower a posteriori uncertainties, the complex trend inversion is better able to constrain annually averaged continental-scale fluxes relative to the simple trend inversion.

4. Conclusions

[62] This paper presents a method for incorporating auxiliary information provided by spatially distributed data sets associated with CO2 flux processes into a geostatistical inverse modeling approach. This approach is then used to estimate monthly averaged, global, grid-scale CO2 fluxes using concentration measurements from a subset of the NOAA-ESRL cooperative air sampling network. The auxiliary data sets with spatially and temporally heterogeneous global coverage help to constrain flux estimates, especially in regions far from measurement locations, and also help to recover fine-scale flux variability that cannot be inferred through the concentration data alone, owing to atmospheric transport and mixing. The resulting flux estimates therefore have more realistic variability at smaller scales, and have lower uncertainty, than those presented in the Mueller et al. [2008] geostatistical inversion study, which relies only on the information content of the atmospheric data. This conclusion is supported by the physically reasonable relationships (equation image) between the auxiliary variables and flux recovered by the inversion, as well as the reduction in grid-scale a posteriori uncertainty achieved by the complex model of the trend.

[63] The Variance-Ratio test is used to determine the combination of the candidate auxiliary variables that is best able to explain the flux variability evident in the atmospheric measurement data. From an initial superset of 14 auxiliary variables, five variables, associated with either biospheric activity or fossil fuel emissions, were found to significantly improve the model of the trend. An analysis of the estimated drift coefficients on the auxiliary variables shows that LAI and fPAR capture a substantial portion of the combined signal of photosynthesis and respiration. The negative drift coefficient for LAI and the positive one for fPAR are opposite to mechanistic relationships typically assumed between these variables and CO2 flux; however, an analysis of these data sets shows that the weaker seasonality in the fPAR data set relative to LAI allows this variable to more strongly explain the signal associated with total ecosystem respiration at the scales examined in this study. The drift coefficients for the other selected variables indicate that % Shrub Cover explains residual biospheric sinks (or decreases in sources), while GDP and Population Densities explain approximately 70% of the expected global fossil fuel emission signal. One aspect that is the subject of ongoing work is the impact of the assumption of a constant global relationship between the auxiliary variables and flux within the model of the trend, which is more strongly affected by fluxes in well-constrained regions.

[64] As reflected in the optimized covariance parameters associated with the flux residuals and the model-data mismatch, the model of the trend implemented in this study is able to explain significantly more of the flux variability evident from the atmospheric data relative to a simple model of the trend containing monthly flux averages over land and ocean, as implemented by Mueller et al. [2008]. The reduction in the covariance parameters leads to reduced a posteriori uncertainties on the flux estimates of up to 14% for the annually averaged grid-scale fluxes, and up to 19% at the annually averaged continental scale. This uncertainty reduction is strongest in underconstrained regions in Africa, South America and Southeast Asia.

[65] A comparison of the seasonal cycle of flux estimates at continental scales shows no significant differences between the simple trend inversion of Mueller et al. [2008] and the complex trend inversion implemented in this study. However, at the annually averaged continental scale, the auxiliary variables in the complex trend significantly change fluxes in a few terrestrial regions underconstrained by the measurement network, in a manner consistent with bottom-up understanding of flux in these regions. Conversely, the stronger inferred sink in Australia shows that a global average linear relationship between auxiliary variables and flux may not be representative for some regions or variables. Apart from these few terrestrial regions, the agreement among both the monthly and annually averaged fluxes at the continental scale points to a strong atmospheric constraint on flux estimates at spatially aggregated scales.

[66] Finally, the geostatistical inverse modeling approach presented here provides a method for validating scale-dependent understanding of the relationship between various data sets associated with CO2 flux processes and actual CO2 flux variability, as seen through the existing atmospheric monitoring network. In future work, the use of biospheric model output and nonlinear and regional relationships in the model of the trend could help to differentiate among competing hypotheses about processes controlling flux variability, and thereby contribute to process-based understanding of CO2 flux drivers. This approach will also continue to improve flux estimates, while minimizing a priori assumptions inherent to inversion studies. As such, the geostatistical approach provides a unique opportunity for reconciling top-down and bottom-up estimates of CO2 flux variability at various spatiotemporal scales.


[67] The authors gratefully acknowledge Adam Hirsch and Deborah Huntzinger for important feedback on this manuscript, as well as Andy Jacobson and Pieter Tans for discussion about this work. In addition, the authors thank NOAA-ESRL for providing the atmospheric CO2 concentration data used in this work, Christian Rödenbeck for providing the transport matrix (H) from his 2003 study, and Kevin Gurney for providing measurement residual standard deviation values for the examined measurement locations. Additionally, the authors thank Charles Humphriss for help with sourcing many of the auxiliary data sets used in this study. This material is based on work supported by the National Oceanic and Atmospheric Administration under contract RA133R-05-SE-5150 “Geostatistical Analysis of NOAA Climate Monitoring and Diagnostics Laboratory Carbon Dioxide Data for 1997-2001,” issued by the Climate Modeling and Diagnostics Laboratory, now part of the Earth System Research Laboratory. Additional support was provided by the National Aeronautics and Space Administration under grant NNX06AE84G “Constraining North American Fluxes of Carbon Dioxide and Inferring Their Spatiotemporal Covariances through Assimilation of Remote Sensing and Atmospheric Data in a Geostatistical Framework” issued through the ROSES A.6 North American Carbon Program to the University of Michigan, and grant NNX06AE65G to the National Snow and Ice Data Center, University of Colorado.