Maximum likelihood estimation of covariance parameters for Bayesian atmospheric trace gas surface flux inversions



[1] This paper introduces a Maximum Likelihood (ML) approach for estimating the statistical parameters required for the covariance matrices used in the solution of Bayesian inverse problems aimed at estimating surface fluxes of atmospheric trace gases. The method offers an objective methodology for populating the covariance matrices required in Bayesian inversions, thereby resulting in better estimates of the uncertainty associated with derived fluxes and minimizing the risk of inversions being biased by unrealistic covariance parameters. In addition, a method is presented for estimating the uncertainty associated with these covariance parameters. The ML method is demonstrated using a typical inversion setup with 22 flux regions and 75 observation stations from the National Oceanic and Atmospheric Administration-Climate Monitoring and Diagnostics Laboratory (NOAA-CMDL) Cooperative Air Sampling Network with available monthly averaged carbon dioxide data. Flux regions and observation locations are binned according to various characteristics, and the variances of the model-data mismatch and of the errors associated with the a priori flux distribution are estimated from the available data.

1. Introduction

[2] The use of inverse modeling methods as tools for estimating fluxes of atmospheric trace gases has become increasingly common as the need to constrain their global and regional budgets has been recognized [Intergovernmental Panel on Climate Change (IPCC), 2001; Committee on the Science of Climate Change, Division on Earth and Life Studies, National Research Council, 2001; Wofsy and Harriss, 2002]. Inverse methods attempt to deconvolute the effects of atmospheric transport and recover source fluxes (typically surface fluxes) based on atmospheric measurements. Information about regions that are not being directly sampled can potentially be inferred from downwind atmospheric measurements. Inverse modeling methods have been used to estimate regional contributions to global budgets of trace gases such as CFCs, CH4, and CO2, and a review of recent applications is presented by Enting [2002, chap. 14–17].

[3] A vast majority of recent inverse modeling studies have relied on a classical Bayesian approach, where the solution to the inverse problem is defined as the set of flux values that represent an optimal balance between two requirements. The first criterion is that the optimized, or a posteriori, fluxes should be as close as possible to the first-guess (a priori) fluxes. The second is that the measurement values that would result from the inversion-derived (a posteriori) fluxes should agree as closely as possible with the actual measured concentrations. For the case where the advection scheme is linear, this solution corresponds to the minimum of a cost function Ls defined as:

equation image

where z is an n × 1 vector of observations, H is an n × m Jacobian matrix representing the sensitivity of the observations z to the function s (i.e., Hi,j = ∂zi/∂sj), s is an m × 1 vector of the discretized unknown surface flux distribution, R is the n × n model-data mismatch covariance, sp is the m × 1 prior estimate of the flux distribution s, Q is the covariance of the errors associated with the prior estimate sp, and the superscript T denotes the matrix transpose operation. A solution in the form of a superposition of all statistical distributions involved can be computed, from which a posteriori means and covariances can be derived [e.g., Enting et al., 1995]. The solution is [Tarantola, 1987; Enting, 2002],

equation image
equation image

where equation image is the posterior best estimate of s, and Vequation image is its posterior covariance.

[4] In most Bayesian studies, both R and Q have been modeled as diagonal matrices. In this case, the diagonal elements of R represent the model-data mismatch variance of each observation, which is the sum of the variances associated with all error components such as, for instance, the observation error, the transport modeling error, and the representation error. The diagonal elements of Q, on the other hand, represent the error variance of the prior flux estimates and specify the extent to which the real fluxes are expected to deviate from prior flux estimates. It is important to note that although the vast majority of Bayesian inversion studies have used diagonal covariance matrices, some errors are known to be both temporally and spatially correlated.

[5] One of the challenges of the Bayesian approach is the need to estimate the parameters defining the model-data mismatch covariance matrix R and the prior error covariance matrix Q. These covariances determine the relative weight of prior information versus available data in estimating individual fluxes, and are therefore key components in estimating the posterior covariance (and thereby the uncertainty) of these fluxes. As a result, identifying appropriate covariance parameters is essential to accurate flux estimation. The importance and challenge of accurately estimating these parameters [Kaminski et al., 1999; Rayner et al., 1999; Law et al., 2002; Peylin et al., 2002; Engelen et al., 2002] and the lack of objective methods for doing so [Rayner et al., 1999] have been increasingly recognized in the literature.

[6] Past inversion studies have relied on a variety of methods for estimating these parameters. In the original synthesis inversion work of Enting et al. [1995], the data uncertainty was based on a statistical characteristics of the NOAA flask sampling procedures [Tans et al., 1990], but the uncertainty associated with errors in the atmospheric transport model could not be quantified. Many more recent studies have derived model-data mismatch from the residual standard deviation of flask samples around a smooth curve fit [e.g., Hein et al., 1997; Bousquet et al., 1999; Gurney et al., 2002; Peylin et al., 2002]. Others have relied on values independently obtained from the literature [e.g., Kandlikar, 1997]. The a priori flux errors have been even more difficult to quantify [Kaminski et al., 1999], and the choice of prior errors has even been described as “mostly arbitrary” in some studies [Bousquet et al., 1999], even though it is recognized that these parameters are crucial to the inversion. Often, researchers have applied what are considered to be “loose” priors [e.g., Peylin et al., 2002; Law et al., 2002] in order to yield conservative estimates of the flux uncertainties. On the basis of assessments of available data, oceanic fluxes have usually been considered more certain than terrestrial fluxes [e.g., Kaminski et al., 1999]. Although considerable effort has been put into estimating covariance parameters, the specification of the prior uncertainties has been described as the “greatest single weakness” in some studies [Rayner et al., 1999].

[7] Recently, several researchers have scaled the model-data mismatch and/or prior error covariance parameters to obtain a data misfit function that follows a χ2 distribution with a given number of degrees of freedom [e.g., Rayner et al., 1999; Gurney et al., 2002; Peylin et al., 2002; Rödenbeck et al., 2003]. The correct number of degrees of freedom to be used in such an analysis is equal to the total number of independent pieces of information introduced into the system (equal to the number of observational data plus the number of prior flux estimates in the case of diagonal Q and R matrices), minus the number of variables estimated in the inversion (typically equal to the number of estimated fluxes). If the covariance parameters are reasonable, the sum of the squared residuals, scaled by their uncertainties and normalized by the number of degrees of freedom, should be close to 1 (reduced chi-squared χr2 = 1). Some researchers have applied this test to the residuals between the available observations and those that would result from the a posteriori fluxes obtained from the inversion, while others have applied it both to these observation residuals and to residuals of the a posteriori fluxes from their prior estimates. Such tuning, however, does not yield a unique solution because more than one combination of covariance parameters can lead to the residuals having an acceptable χr2. In addition, looking at the χr2 cannot guide the relative allocation of error between the model-data mismatch and prior flux estimates [Rayner et al., 1999]. Therefore examining the variance of the residuals is a necessary, but not a sufficient, condition for evaluating the appropriateness of covariance parameters.

[8] A few recent studies have attempted to systematically quantify one or more components of the model-data mismatch. Engelen et al. [2002] divided the model-data mismatch covariance into four additive covariances (which were modeled as diagonal matrices) representing: (1) the observation error, (2) the error in mapping concentrations at a specific location into the measured quantity, (3) the model transport error, and (4) the representation error describing the effects of model resolution. Each of these error sources was then quantified based either on available additional information (e.g., known precision of analytical methods) or numerical experiments (e.g., comparing inversion results obtained using various transport models in order to estimate transport error). Kaminski et al. [2001] focused on the errors introduced as a result of imposing potentially erroneous fixed flux patterns within regions, and estimated the additional variance that should be added to the model-data mismatch covariance to account for this effect. Krakauer et al. [2004] estimated single scaling factors for both the model-data mismatch and prior error covariances using a generalized cross-validation approach, to show that the Transcom inversion results for the global land/ocean partitioning, particularly in the Southern Hemisphere and tropical regions, were influenced strongly by the Transcom choice of parameters. The fluxes resulting from this analysis and their associated residuals were not analyzed to determine whether these parameters result in fluxes that are consistent with the underlying statistical assumptions, and the uncertainty associated with the estimated covariance parameters was not quantified.

[9] The method that will be developed in this paper allows the available data to shed light on the covariance parameters to be used in the inversion, both in defining the model-data mismatch and the error associated with the prior flux distribution. This can be done in a consistent manner by identifying the Maximum Likelihood (ML) estimates of these parameters, given the prior flux estimates sp, the available measurements z, and the sensitivity H of these measurements to the fluxes to be estimated. The method is applicable whenever estimates of the covariance parameters are to be obtained from the atmospheric data themselves. A ML approach has previously been used in estimating spatial drift and covariance parameters of hydrologic variables, based on limited measurements of these parameters [Kitanidis and Lane, 1985]. Also, a related restricted maximum likelihood (RML) approach has been used for estimating covariance parameters in geostatistical inverse modeling [e.g., Kitanidis, 1995; Michalak and Kitanidis, 2004]. Recently, this technique was applied to the estimation of covariance parameters in a geostatistical implementation of the atmospheric trace-gas inversion problem [Michalak et al., 2004]. In this paper, we develop and demonstrate a maximum likelihood methodology for estimating the model-data mismatch and prior-error covariance matrix parameters needed in the application of the classical Bayesian inverse modeling approach. This method can be directly applied to all studies where the objective function is of the form presented in equation (1), which is the most common setup currently being used in atmospheric inversion studies. This method allows, for the first time, for an objective and data-driven estimation of both the model-data mismatch and prior error covariance parameters required for the solution of the flux estimation inverse problem. Several covariance parameters can be estimated simultaneously, the resulting fluxes yield residuals that follow the assumed distribution (e.g., χr2 = 1), and the uncertainty associated with the covariance parameters can also be estimated. In the sample application presented in this work, both R and Q are modeled as diagonal matrices, but the method is directly applicable to cases where these matrices include spatial and temporal correlation (i.e., off-diagonal terms). In such cases, parameters such as the correlation length of flux deviations from their prior estimates can also be estimated.

2. Maximum Likelihood Estimation

[10] We are interested in finding the maximum likelihood estimate of the model-data mismatch and prior error covariance parameters in Bayesian inverse modeling. Note that throughout this discussion, the term prior errors refers to the errors between the actual fluxes and the prior flux estimates. In the case where both the model-data mismatch matrix and the prior error covariance matrix are diagonal matrices of variances, this can be thought of as optimizing the variances in

equation image

where σR,i2 are the model-data mismatch variances of individual observations and σQ,i2 are the prior-error variances of individual prior flux estimates. In the simplest case, we could assume that a single variance describes the model-data mismatch at all sites, and another variance describes the prior error for all source regions, yielding a total of two parameters to be estimated. On the other end of the spectrum, a different model-data mismatch could be estimated for each measurement location and each source region. Other options would be to populate the covariance matrices with scaling factors that we believe to be representative of the relative errors of certain measurements or priors and solve for overall proportionality constants that would adjust the magnitude of these variances, or to estimate the covariance (in space and/or in time) of the errors. Various options will be demonstrated in section 3 of this paper. It is important to note that one would not go as far as estimating a different variance for each individual flask measurement, because one cannot estimate a variance using a single measurement.

2.1. Probability Density Function

[11] We are interested in the probability density function of the covariance parameters of R and Q, which we will jointly call θ, given the available observations z, the prior estimates sp, and the transport matrix H. In the example given in equation (4)θ = {σR,12,…, σR,n2, σQ,12,…, σQ,m2}, which would be binned into subgroups based on flux regions and measurement stations, and potentially further grouped by regions or stations that are expected to have similar properties (e.g., marine boundary layer stations). According to Bayes' rule, the pdf of the parameters θ can be defined as

equation image

where the denominator is simply a normalizing constant equal to the probability of the data p(zH, sp), and where a prime denotes a prior distribution while a double prime denotes a posterior distribution. We assume that, a priori, the probability of the parameters θ is uniform over all values, p′(θH,sp) ∝ 1, which corresponds to the assumption that we do not have any prior knowledge about these parameters. Then

equation image

yielding the likelihood function of θ. In other words, if we assume that we have no prior information on θ, its pdf is defined by the likelihood of the observations, and the best estimate corresponds to the set of values that maximize this likelihood.

[12] The observations z are modeled as

equation image

where equation imagez is a vector representing the model-data mismatch, typically modeled as a vector of n × 1 normally distributed random numbers with variance(s) defined by the matrix R. A priori, the expected value of z is

equation image

The covariance matrix of z is

equation image

This expected value and covariance can be used to define the Gaussian probability density function p(zH, sp, θ), which, from (6) is proportional to p″(θH, sp, z):

equation image

where ∣ ∣ denotes a matrix determinant, and Q and R are functions of θ. Note that the above equation is normally distributed with respect to z but not θ. Also, this pdf is independent of the actual flux distribution s.

2.2. Best Estimate

[13] In order to obtain the maximum likelihood estimate of the covariance parameters, we want to maximize equation (10) with respect to θ, or alternately, minimize its negative logarithm,

equation image

Note that higher variance values in Q and R will tend to increase the value of the first term and decrease the value of the second term in this equation. The θ values that minimize this overall objective function are the ML estimates of these parameters.

[14] Conceptually, differences between the available observations and observations predicted by the prior flux estimates result from two types of error. First, the model-data mismatch error, parameterized by the covariance matrix R, results from random measurement and transport errors. Second, the prior error that is parameterized by the covariance matrix Q accounts for more systematic errors in the predicted observations because, as a result of the mixing that takes place as gases are transported downwind, error in a single flux region will be sampled at multiple observation locations. Although these two sources of error cannot be separated perfectly, the maximum likelihood approach offers a rigorous method for estimating the contributions of these two forms of error. Looking at the objective function in equation (11), the model-data mismatch error in R is included directly in the optimization, whereas the prior error in Q is ‘filtered’ using the transport matrix H. Therefore, the effect of the model-data mismatch is expected to behave as an independent, identically distributed (i.i.d.) additive error for the case of a diagonal R, whereas the effect of the prior error will have a more systematic effect on a group of observations. The intuitive separation between the model-data mismatch and prior error is illustrated in Figure 1.

Figure 1.

This figure presents a conceptualization of (a) a prior flux estimate and (b, c, d) three possible scenarios of how actual observations relate to this prior flux (transported to the time when observations are made). It illustrates the difference between the effects of model-data mismatch error and prior error. The prior estimate of a flux is depicted in Figure 1a. After a given time, the compound is advected (assuming only advection and no mixing) to a downwind location. On the basis of the prior estimate in Figure 1a, the a priori predicted distribution at the new time is presented as the thick line in Figures 1b, 1c, and 1d. The effect of model-data mismatch is expected to be relatively independent for each observation and is presented in Figure 1b. The effect of prior error in the magnitude of the flux is correlated among several observations and is presented in Figure 1c. Their combined effect is presented in Figure 1d.

2.3. Parameter Uncertainty

[15] An estimate of the uncertainty of parameters θ can be obtained from the inverse of the Hessian of equation (11),

equation image

where the Hessian is defined as

equation image

and has dimensions p × p, where p is the total number of parameters to be estimated. In order to simplify the notation, we define the additional variables,

equation image
equation image
equation image

which all have dimensions n × n. Also, we will use the following properties [Schweppe, 1973]:

equation image
equation image

where Tr is the “trace” operator, which is the sum of the diagonal elements of a matrix. Given these additional variables, the Hessian becomes

equation image

Note that in the case where both R and Q are diagonal matrices of variances, Ψij = [0], and ℋij simplifies to

equation image

[16] In some other cases, the second derivative Ψij may be difficult or expensive to compute, in which case the covariance of the parameters can be approximated using the Cramer-Rao inequality [Schweppe, 1973] by

equation image

where ℱij is the Fisher information matrix and is the expected value, with respect to z, of the Hessian ℋij:

equation image

The diagonal elements of Vθ represent the uncertainty of the estimates of the covariance parameters, as defined by their estimation variance. The uncertainty of the covariance parameters could be incorporated into the subsequent inversion by drawing an ensemble of samples of the covariance parameters based on their best estimates and uncertainty, and solving the inversion for each set of parameters [e.g., Kitanidis, 1986].

2.4. Gauss-Newton Method

[17] In general, an iterative method is needed to find the minimum of equation (11) with respect to the vector of values θ. One efficient option is the Gauss-Newton algorithm [e.g., Gill et al., 1986]. Starting from an initial estimate of θ denoted equation imagek the algorithm proceeds as

equation image

where ℱ is the expected value, with respect to z, of the Hessian ℋ of the likelihood function, ℋij and ℱij are as defined in equations (19) and (22), g is a vector of the first derivatives of the likelihood function Lθ with respect to θ,

equation image

and the subscript k in g and ℱ means that they are calculated using equation imagek. The indices take on the values i = 1,…, p and j = 1,…, p, where p is the total number of parameters θ to be estimated. Note that gk is calculated using the latest estimate equation imagek. In the special case where both R and Q are linear in θ, gk is a constant vector.

2.5. Method Validation

[18] In order to demonstrate and validate the proposed method, sample applications are presented in the section 3. For each case, once the covariance parameters and their uncertainties have been estimated, the results are examined by performing a standard Bayesian inversion (equations (2) and (3)) using the estimated parameters. The residuals from these inversions are analyzed to ensure that the method has successfully identified covariance parameters consistent with the statistical setup of the inversion.

[19] As discussed by Tarantola [1987], the squared residuals from the inversions should follow a χ2 distribution. If both R and Q are diagonal and the residuals are calculated using the posterior best estimate of the flux distribution, the sum of the squared data and flux residuals, normalized by the variances in R and Q, should follow a χ2 distribution with n degrees of freedom [Tarantola, 1987]. This results from the fact that the residuals are not independent, and there are only n degrees of freedom among the n + m residuals for the case where the R and Q matrices are diagonal. This criterion is applicable to the full set of residuals, and cannot be applied directly to only flux or observation residuals, or to residuals from individual stations or regions. Note that the covariance parameters are treated as deterministic parameters in the inversion step, and the number of covariance parameters therefore does not affect the number of degrees of freedom of the residual χ2 distribution.

[20] If, instead of using the best estimates of the fluxes, the residuals from prior fluxes and observations are calculated using conditional realizations of the a posteriori fluxes (see Appendix A), the residuals are expected to follow the statistical distributions specified in the covariance matrices R and Q. If these matrices are diagonal, this implies that the residuals from the actual fluxes (and from the conditional realizations) are expected to be independent. The conditional realizations represent the range of possible flux distributions, given the assumptions and data incorporated into the inversion. For the case where residuals are expected to be independent, the number of degrees of freedom is equal to the number of residuals. Therefore the sum of the squared residuals from individual conditional realizations, normalized by the variances in R and Q, should follow a χ2 distribution with n + m degrees of freedom. More importantly, because the residuals should follow the distributions specified in R and Q, which do not include a cross-covariance between data and flux residuals, observation residuals can be analyzed separately from flux residuals to determine whether both the observations and the prior fluxes are reproduced to the extent assumed by the parameters in the model-data mismatch and prior error covariance matrices. In addition, residuals from individual stations and/or regions can also be analyzed. Whether residuals are calculated from the best estimate or conditional realizations, the χr2 statistic (with the appropriate number of degrees of freedom) should be equal to 1 in the case of a sufficiently large number of observations.

[21] Once a conditional realization sci has been generated, the χr2 statistic can be calculated both for the observations and the prior flux estimates,

equation image
equation image

where χr,z2 is the statistic for the full set of observations used in the inversion, and χr,s2 is the statistic for the full set of fluxes estimated in the inversion. As mentioned earlier, a χr2 = 1 statistic is not a sufficient condition for identifying the optimal set of covariance parameters, but it is a necessary condition. Therefore, although this statistic is not used in estimating the covariance parameters, inversions using the optimal parameter values should yield residuals with χr2 = 1.

[22] Similarly, a χr2 statistic can be calculated for individual stations and/or flux regions. Because we are classifying observation locations and flux regions into a small number of groups, we do not expect the χr2 for individual stations or regions to be exactly equal to one. This is because this setup yields a single variance for a subset of stations/regions, whereas in reality these stations/regions may exhibit slightly different error characteristics. Instead, if we have grouped the observation locations and flux regions in a manner that yields groups with similar characteristics, we expect the χr2 statistic for individual locations and regions to be distributed around 1. As we allow for more groups (i.e., more estimated covariance parameters) the χr2 statistic of individual members of these groups will cluster more closely around 1. The χr2 statistics for individual locations image and individual flux regions image are defined as

equation image
equation image

where zj are the observations taken at a given location j, Hj is the transport matrix relating these observations to all fluxes, Rj is the portion of the model-data mismatch covariance matrix relating to observations zj, sk are the flux components for a given region k, sci,k and sp,k are the flux conditional realization values and the prior flux estimates for the same region, and Qk is the portion of the prior error covariance matrix relating to fluxes sk.

3. Application to CO2 Data

[23] The method is illustrated using monthly averaged carbon dioxide data for 1996 through 2000, from 75 Climate Monitoring and Diagnostics Laboratory (CMDL) cooperative air sampling network sites (Figure 2). These data represent a total of 2698 monthly averaged observations. The earth is broken up into 22 flux regions (Figure 3), according to the divisions used in the TransCom3 experiment, an atmospheric transport inversion intercomparison study [Gurney et al., 2002, 2003, 2004]. The transport matrix H was obtained by running month-long unit pulses originating from each region and in each month of the study period, and recording the monthly averaged response at each sampling time and location. The spatial flux patterns within regions were based on work by Randerson et al. [1997] for net ecosystem production (NEP), and Takahashi et al. [2002] for net oceanic carbon exchange. The influence of fossil fuel emissions, based on work by Brenkert [1998] and Andres et al. [1996], was presubtracted from the measurements. This setup is similar to that used in TransCom3 [Gurney et al., 2002]. The transport model used was TM3 [Heimann, 1996] run at 7.5° latitude by 10.0° longitude resolution, with nine vertical levels spanning the surface to 10 hPa and a 3-hour integration time step. The model was implemented with National Center for Environmental Prediction (NCEP) windfields corresponding to the modeled years. It is important to note that the obtained covariance parameters are a function of the particular setup, and are not necessarily representative of optimal values to be used in significantly different inversion setups. We recommend that researchers implement the ML method with their particular data and setup in order to obtain optimal values to be used in their own work.

Figure 2.

NOAA CMDL cooperative air sampling network sites used in the analysis. Sites indicated in blue and in green represent ship tracks.

Figure 3.

Definition of 22 Transcom3 regions on a 1° latitude by 1° longitude scale.

3.1. Setup and Results

[24] In the presented applications, observation locations and flux regions are grouped according to various characteristics commonly thought to be indicative of their relative uncertainty. This is done partly to limit the number of parameters to be estimated, but, more importantly, to demonstrate that the method is able to identify differences between various observation locations and regions that we expect (from our conceptual understanding of the problem) to behave differently. The number of covariance parameters estimated in the presented examples ranges from 2 to 7, but a larger number of parameters could be estimated if deemed necessary in future studies.

[25] Four different cases were considered. These cases are designed to (1) demonstrate the applicability of the derived method given different assumptions regarding the uncertainty associated with measurements and flux regions, (2) verify that the method can be used to distinguish the covariance characteristics of different groups of observations and/or flux regions, (3) verify that the fluxes resulting from inversions incorporating ML estimates of covariance parameters yield residuals with the specified covariance structure, and (4) demonstrate the versatility of the proposed method.

[26] In the first case, the uncertainty associated with all measurements is assumed to be the same for all measurement locations, and the uncertainty of the prior flux estimates is assumed not to vary with flux regions. In the second case, observations are grouped into marine boundary layer sites, high elevation or desert sites, and all other remaining (i.e., continental) sites, because such a distinction is commonly thought to strongly contribute to the degree to which measurements can be reproduced. Measurements from marine boundary layer sites can typically be reproduced most closely, and high elevation and desert sites may also exhibit lower model-data mismatch relative to other continental sites. Regions in this and subsequent cases are separated into land and ocean regions, because prior fluxes for ocean regions are considered to be less uncertain than those in land regions, and separating the regions in this way allows the ML algorithm to potentially identify this feature. In the third case, observation locations are separated into five groups. Contrary to case 2, however, the distinction is not based on specific physical characteristics of the locations. Instead the observation locations are separated in an ad-hoc fashion based on the observed model-data mismatch at those locations. In the fourth case, finally, the residual standard deviation from a smooth curve of measurements taken at individual locations is assumed to be indicative of the expected model-data mismatch. Therefore, instead of separating observation locations into a set number of groups with a constant model-data mismatch variance for each group, the relative variance for each observation location is assumed to be proportional to the residual variance from a smooth curve. The ML method is then used to solve for a single scaling factor which determines the model-data mismatch variance at each site.

[27] All cases presented here involve diagonal model-data mismatch (R) and prior error (Q) covariance matrices. Estimated covariance parameters are summarized in Table 1. For all the presented cases, implementing an inversion with the optimized parameters resulted in an overall χr,z2 and χr,s2 of 1.0. This indicates that the ML method yields covariance parameters that lead to fluxes and modeled observations with overall variances that are exactly consistent with the specified model-data mismatch and prior error covariance parameters. As already outlined, this is not the criterion used by the ML method to select parameters, but simply a side-effect of identifying the correct parameters.

Table 1. Estimated Covariance Parameters for Examined Casesa
CaseModel-Data Mismatch by StationPrior Error by Region
StationMismatchRegionPrior Error
  • a

    Intervals represent 1 standard deviation.

1all stationsσR = 1.63 ± 0.03 ppmAll regionsσQ = 2.17 ± 0.08 GtC/yr
2marine boundary layerσR = 0.71 ± 0.02 ppmoceanσQ = 1.07 ± 0.07 GtC/yr
 high elevation and desertσR = 1.49 ± 0.06 ppmlandσQ = 2.02 ± 0.12 GtC/yr
 otherσR = 3.16 ± 0.10 ppm  
3subset 1σR = 0.58 ± 0.01 ppmoceanσQ = 1.21 ± 0.06 GtC/yr
 subset 2σR = 1.04 ± 0.04 ppmlandσQ = 1.76 ± 0.10 GtC/yr
 subset 3σR = 1.91 ± 0.07 ppm  
 subset 4σR = 4.36 ± 0.23 ppm  
 subset 5σR = 7.64 ± 0.72 ppm  
4all stations, weightedσR = 0.11 ± 0.002 ppmoceanσQ = 0.89 ± 0.08 GtC/yr
  to 5.96 ± 0.13 ppmlandσQ = 2.08 ± 0.15 GtC/yr

3.1.1. Case 1 (Figure 4)

[28] The first case considered a single variance for the model-data mismatch for all stations, and a second single variance for the prior error for all regions. The model-data mismatch standard deviation was determined to be σR = 1.63 ± 0.03 ppm, where the intervals represent 1 standard deviation. This error range is consistent with typical values used in inversions. The overall prior error standard deviation was σQ = 2.17 ± 0.08 GtC/yr.

Figure 4.

The χr2 statistics calculated from conditional realizations of the a posteriori fluxes resulting from an inversion with covariance parameters optimized according to case 1. (top) The image for individual observation stations. (bottom) The image for individual flux regions. The solid lines represent χr2 = 1.0, which is the mean χr2 in both panels (across all stations or regions).

[29] The image statistic, broken down by individual stations is presented in Figure 4 (top) and the image statistic broken down by individual regions is presented in Figure 4 (bottom). As can be seen in the figure, the fact that measurements taken at some stations cannot be reproduced as precisely as those from certain other stations cannot be captured by this simple setup. As a result, a few observation stations, and especially Hungary (HUN), have a large overall impact on the model-data mismatch variance, because they are much more difficult to reproduce relative to other stations. Also, the Europe flux region deviates from its prior flux estimate much more strongly than all other regions. As will be seen in the other cases, however, this effect disappears once certain stations such as HUN are allowed to have higher model-data mismatch variances relative to other stations. The covariance parameters estimated using this simplest setup still guarantee that, on average, measurements and prior flux estimates are reproduced to the degree prescribed by the covariance parameter. This is evidenced by the fact that, in this and all subsequent cases, implementing an inversion with the optimized parameters results in an overall χr,z2 and χr,s2 of 1.0.

3.1.2. Case 2 (Figure 5)

[30] The second case broke observation locations into three groups: Marine boundary layer (MBL) sites, high elevation or desert sites, and other sites. The flux regions were separated into land regions and ocean regions. Therefore setup 2 required the estimation of a total of five variances. The image and image statistics for individual measurement locations and regions, respectively, are presented in Figure 5.

Figure 5.

The χr2 statistics calculated from conditional realizations of the a posteriori fluxes resulting from an inversion with covariance parameters optimized according to case 2. (top) The image for individual observation stations. (bottom) The image for individual flux regions. Color coding designates grouping of observation locations and flux regions according to the definitions presented in case 2. The solid lines represent χr2 = 1.0, which is the mean χr2 in both panels (across all stations or regions).

[31] One interesting thing to notice is that the ML algorithm clearly recognizes that observations at marine boundary layer stations can be reproduced with a higher precision relative to non-MBL sites. In addition, the model-data mismatch for high elevation and desert sites is lower than that for other continental sites. The model-data mismatch standard deviation for MBL sites estimated by the ML routine and then incorporated into the R matrix is σR = 0.71 ± 0.02 ppm, whereas it was σR = 1.49 ± 0.06 ppm for high elevation and desert sites, and σR = 3.16 ± 0.10 ppm for other non-MBL sites. This is consistent with both our intuitive understanding of the problem and with past inversion studies that have found that a higher variance needed to be used for continental sites.

[32] Similarly, the ML algorithm assigned a higher prior error to land flux regions relative to ocean regions. The prior error standard deviation was σQ = 2.02 ± 0.12 GtC/yr for land regions, and σQ = 1.07 ± 0.07 GtC/yr for ocean regions. Again, past inversion studies have also assigned greater uncertainties to the land priors relative to ocean priors. In this study, however, this conclusion is derived directly from the available data. This confirms that the ML method is able to identify and corroborate the common assumption that land flux prior estimates are more uncertain that ocean flux prior estimates.

3.1.3. Case 3 (Figure 6)

[33] In case 3, observation locations were separated into groups purely based on the model's ability to reproduce the observations. Five different groups were formed. The resulting image and image statistics for individual measurement locations and regions, respectively, are presented in Figure 6, with the colors indicating the various groups. The resulting model-data mismatch and prior error standard deviations are presented in Table 1.

Figure 6.

The χr2 statistics calculated from conditional realizations of the a posteriori fluxes resulting from an inversion with covariance parameters optimized according to case 3. (top) The image for individual observation stations. (bottom) The image for individual flux regions. Color coding designates grouping of observation locations and flux regions according to the definitions presented in case 3. The solid lines represent χr2 = 1.0, which is the mean χr2 in both panels (across all stations or regions).

[34] This fourth case results in the image and image statistics being most closely clustered around 1.0. Part of this is due to the fact that we are estimating the largest number of parameters, getting closer to a case where each stations and each regions would have its own variance, and where these statistics would be expected to be very close to 1.0 for all stations and regions. The second reason for this clustering is that we are using the performance of the model itself to help group observation locations, instead of relying exclusively on our physical understanding of the problem. In such a setup, sites for which we may expect lower errors due to their physical locations may be assigned to groups with higher errors, recognizing potential transport errors due to, for instance, biased wind fields, incorrect assigned regional flux patterns, and/or sampling error at coastal sites. In practice, when applying this method, researchers can choose how they want to group sites, whether they want to rely on physical differences for classification, or allow the numerical results to direct their choices.

3.1.4. Case 4 (Figure 7)

[35] In this last case (Figure 7), instead of breaking the observation locations into groups and estimating a single model-data mismatch variance for each group, each station was assigned a scaling factor based on the average variance between observations taken at that station and a smooth curve fit. These constants were taken from results obtained during the TransCom3 experiment, and were scaled by the proportion of real data in the GLOBALVIEW-CO2 [2000] record [Gurney et al., 2003]. The ML routines were then used to derive a single proportionality constant that would scale these variances to obtain an optimal overall model-data mismatch for each site. The resulting model-data mismatch covariance function was

equation image

where σS,i2 are the variances of the deviations of samples taken at individual sites from a smooth curve fit, and C is the single multiplicative factor that we will estimate using the ML method. The R matrix still has dimensions n × n, where n is the total number of samples considered, but the values on the diagonal correspond to the residual variances observed at the site where each observation was taken. The prior error was broken into land and ocean groups, as in cases 2 and 3. Note that this final case uses only 41 stations, which are the subset of the examined 75 stations used here which were also used in the TransCom3 annual-average inversion study [Gurney et al., 2003]. This also means that there are a total of 41 different variances that populate the matrix in equation (29).

Figure 7.

The χr2 statistics calculated from conditional realizations of the a posteriori fluxes resulting from an inversion with covariance parameters optimized according to case 4. (top) The image for individual observation stations. (bottom) The image for individual flux regions. The solid lines represent χr2 = 1.0, which is the mean χr2 in both panels (across all stations or regions).

[36] The optimal value of the multiplicative factor was found to be 0.80, with a standard deviation of 0.14, indicating that residual standard deviations should be scaled by a factor of equation image = 0.89 in order to, on average, represent an appropriate model-data mismatch. The image and image statistics for individual measurement locations and regions, respectively, are presented in Figure 7. As can be seen from the figure, classification based on the deviation of the observations from a smooth curve is a much better predictor of model-data mismatch relative to using a constant variance for all stations (Figure 4). Better, in this case refers to the amount of scattering of image and image around one, because the overall χr,z2 and χr,s2 are 1.0 for both cases. This last case does not behave as well as subdividing the stations into a moderate number of constant-variance groups (Figure 6), however. For a few stations (HUN, POC13, PSA, SHM in Figure 2) the model-data mismatch appears to be significantly higher relative to the variance of deviations from a smooth curve as compared to other stations. This is not to say that the variance of the deviation from a smooth curve could not be used as part of the determinant of the relative variance expected at a site, but it does indicate that this criterion may not be effective if used as the only guiding principle.

[37] Whereas the model-data mismatch standard deviations used in the TransCom3 annual inversion study [Gurney et al., 2003] ranged from 0.04 ppm (SYO) to 2.23 ppm (ITN), depending on the residual standard deviation of samples taken from individual sites from a smooth curve fit, the current analysis suggests that, if all stations are considered to have variances proportional to their residual standard deviations, the model-data mismatch standard deviations range from 0.11 ppm (SYO) to 5.96 ppm (ITN). The stations with the maximum/minimum model-data mismatch are consistent for these two studies because all model-data mismatch standard deviations are determined by multiplying the residual standard deviations of samples taken at a given site by a constant factor. Overall, the model-data mismatch variances inferred using the ML method are higher relative to those used in the TransCom3 annual inversions. This is likely due, at least in part, to the fact that the Transcom study used smoothed Globalview data [GLOBALVIEW-CO2, 2000] in their analysis, whereas the current estimates were obtained using CO2 flask data directly. In addition, whereas the Transcom study was focusing on annually averaged fluxes, we have instead estimated monthly fluxes. Additional differences such as the prior flux uncertainties, the selected transport models, and the subsets of sampling sites used would also contribute to this difference. Note also that, in the Transcom study, all model-data mismatch standard deviations were ultimately modified to be at least 0.25 ppm, whereas no such adjustments were made here.

3.2. Effect on Flux Estimates

[38] The purpose of this work is to present a method for estimating covariance matrix parameters used in Bayesian inversions, and presenting detailed flux estimates for the examined period falls outside the scope of this work. However, given that each of the sets of covariance parameters is optimal for its respective setup, it is interesting to examine the effect of the choice of covariance structure on estimated fluxes.

[39] The choice of covariance parameter setup has an effect on both the flux magnitudes and their estimated uncertainties. The effect on the posterior covariance of the fluxes is relatively straightforward. From equation (3), it is apparent that once a transport model has been selected (thereby fixing H), the covariance matrices fully determine the posterior covariance. What is more difficult to ascertain is the effect on the best estimates of the fluxes. Some of the key differences are illustrated in Figure 8. This figure represents the total non-fossil-fuel flux and uncertainty for the year 2000 for four of the 22 Transcom regions. These estimates were obtained by summing the monthly estimated fluxes for 2000. The uncertainties were obtained by summing the appropriate elements of the posterior covariance matrix, including off-diagonal temporal correlation terms. The results for Temperate North America are typical of relatively well-constrained land regions. The different cases yield different total fluxes, but these fluxes generally fall within the other cases' confidence bounds. The results for South America are typical of poorly constrained land regions. The different cases have little effect on either the fluxes or their uncertainties because the measurements contain little information regarding these regions. The fluxes for Europe show quite dramatic variation, most likely due to how these four cases treat measurement from the Hungary (HUN) observation site. Data from this site are very difficult to reproduce as is also evidenced in Figures 4 to 7. The fluxes for the North Atlantic are typical of results for oceanic regions. The strongest difference is between case 1, where land and ocean prior fluxes are represented by a single error variance, and the other cases, where prior estimates of fluxes over land are assigned a different error variance than the oceanic prior fluxes. In case 1, the prior error on ocean regions is significantly higher relative to the other cases, yielding a higher posterior uncertainty for these regions.

Figure 8.

Examples of the effect of the four different covariance parameter cases on estimated fluxes and their uncertainties. The circles represent the estimated average flux for 2000 for four Transcom regions for the four examined cases. The bars represent the estimated standard deviations.

[40] The main conclusions for this analysis are that the choice of covariance matrix setup, and not only the choice of covariance parameter values, can have a strong effect on estimated fluxes. Therefore expert knowledge must be used to guide the selection of an inversion setup.

3.3. Discussion

[41] In all cases, the ML approach is able to estimate covariance parameters that result in residuals that have the variability specified in the covariance matrices (i.e., χr2 = 1). This had been the primary goal of previous attempts to estimate covariance parameters. In addition, the method has been demonstrated as a tool for estimating the overall and relative variance associated with different stations and regions. In all cases, the ML method results in covariance parameters that are qualitatively consistent with our physical understanding of the relative variance of observations from various stations and fluxes from various regions. The method goes beyond this qualitative assessment, however, and quantifies the differing uncertainties associated with individual stations and/or regions. In addition, the method was demonstrated as being an effective guiding tool for grouping observation locations into groups that behave similarly. The method can also incorporate additional information, such as standard deviations of observations from smooth curve fits, into the estimation of covariance parameters. Last, the method provides uncertainty bounds on the covariance parameters, which could allow for sensitivity runs exploring the effect of covariance parameter uncertainty on the estimated flux distributions.

[42] As expected and as can be seen from Table 1, increasing the number of possible variance parameters allows for individual groups of stations or regions to have different behavior. When the grouping corresponds to real differences in the model-data mismatch, the image statistics of individual stations are clustered more closely around 1.0, as can be seen then comparing Figures 4 to 7. It appears especially important to assign a different variance to stations that are particularly difficult to match, such as Hungary (HUN). Otherwise, as can be seen in Figure 4, these few stations have too great an influence on the overall covariance parameters, and as a result most stations have image significantly below 1.

[43] Case 3 also demonstrated that one does not necessarily need to separate stations based on our conceptual understanding of the difference between various sites, but can in fact let the model guide group selections, and thereby identify sites that behave similarly, in terms of the degree to which their observations can be reproduced given the selected inversion setup.

[44] Case 4 demonstrated the possibility of using the ML algorithm to solve for variance proportionality constants, instead of solving for individual group variances. Future studies could explore indices based on other physical attributes, to determine whether they can predict variability at given sites, and whether they can do this better relative to binning sites into fixed-variance groups, as was done in cases 1–3.

[45] Finally, as would be expected, the larger the number of covariance parameters that are to be estimated, the higher their uncertainty. This is a direct result of the fact that, as the number of observation location groups increases, there are fewer measurements to constrain the covariance parameters of each individual group. Overall, however, the uncertainty of the covariance parameters appears quite small. It is important to note that this low uncertainty is only valid if all the assumptions used in the problem setup are valid. As is currently typical in flux inversions we have assumed that model-data mismatch errors are independent, as are the prior flux errors for large regions. In addition, we have binned the covariance parameters into a relatively small number of representative groups. Therefore the uncertainties on the covariance parameters should be interpreted in the context of the specific setup used here. If, for example, a different model-data mismatch uncertainty were to be calculated for each observation site, we would expect a higher uncertainty on these parameters relative to the uncertainty on the covariance parameter averaged over several sites, as was done here.

4. Conclusions

[46] The method presented in this paper uses a maximum likelihood framework with available observations, prior flux estimates and transport information to optimize the covariance parameters needed in the solution of Bayesian inverse problems used to estimate surface fluxes of atmospheric trace gases. The method can also be applied to estimate the uncertainty in these parameters. The strong influence and critical importance of these parameters has been discussed in many studies [Kaminski et al., 1999; Rayner et al., 1999; Law et al., 2002; Peylin et al., 2002; Engelen et al., 2002]. Up to this point, however, no objective method was available to estimate these parameters from the available data and guarantee that the resulting flux and observation residuals would follow the assumed distribution.

[47] In addition to optimizing covariance parameters for a given inversion setup, the method can also be used to evaluate whether grouping certain measurement stations or flux regions is justified given the available information. For example, if stations are grouped according to whether or not they constitute marine boundary layer sites, and the resulting model-data mismatch variances are different for the two groups, this suggests that whether or not a station samples marine boundary layer air is a predictor of the precision with which the data can be reproduced.

[48] The examples presented in this paper were for diagonal model-data mismatch and prior error covariance matrices. The method can be directly applied, however, to more complex covariance matrices, where correlation lengths or other parameters must be estimated in addition to variances. Such applications have been demonstrated using the restricted maximum likelihood approach used in geostatistical inverse modeling [Michalak et al., 2004].

[49] Note that we do not advocate researchers using the model-data mismatch variances and prior error variances derived in this work directly, because the optimal values will depend not only on the data set used, but also on the transport model, the defined flux regions, flux patterns within regions, etc. Instead, we recommend that researchers implement the maximum likelihood algorithm described in this work in order to gain insight into covariance parameters that are appropriate for their specific applications. The examples presented in this work simply demonstrate the applicability of the presented method to typical inversions.

[50] Most importantly, in all the presented cases, the ML method identified the most likely covariance parameter values, given the selected inversion setup and covariance matrix definitions. These variances are optimized solely using the available data, transport information, and prior flux estimates.

Appendix A:: Generation of Conditional Realizations

[51] If the model-data mismatch and prior-flux error covariance parameters are selected appropriately, residuals calculated from conditional realizations of the a posteriori fluxes will follow the distributions specified in Q and R. The method presented here generates equally likely conditional realizations of the function s. The first step is to generate an unconditional realization sui of the unknown function s from the prior covariance Q. Here the subscript u indicates that the realization has not been conditioned on the data, and the subscript i serves as a counter and a reminder that there is an infinite number of possible realizations. There are several methods for generating sui, and the method presented here is based on eigenvalue decomposition. In the general case,

equation image

where V is an (m × m) matrix containing the eigenvectors of Q, λequation image is an (m × m) diagonal matrix with the square root of the eigenvalues of Q on the diagonal, and equation images is an (m × 1) vector of normally distributed random numbers with mean zero and variance one. For the case where Q is a diagonal matrix, as is often the case in classical Bayesian inverse modeling, the above equation simplifies to

equation image

where Q1/2 is a diagonal matrix with the square root of the prior error variances on the diagonal.

[52] The conditional realization is then obtained through

equation image

where the subscript c refers to the fact that the realization has been conditioned on the data z, and equation imagez is an (n × 1) vector sampled from the model-data mismatch covariance matrix R, using the same method as was described in equation (A1). The validity of this algorithm can easily be demonstrated numerically by generating a large number of conditional realizations and verifying that their statistics (e.g., median and 95% confidence intervals) are identical to those implied by equation image and Vequation image.

[53] In addition, substituting the definition of equation image (equation (2)) and using the definitions

equation image
equation image

the expected covariance of residuals from conditional realizations can be shown to be

equation image

verifying that, given appropriate covariance parameters, we expect these residuals to be sampled from Q and R.


[54] Funding for Anna Michalak was partially provided by a NOAA Climate and Global Change postdoctoral fellowship, a program administered by the University Corporation for Atmospheric Research (UCAR). The authors would like to thank all of the collaborators in the NOAA/CMDL Cooperative Air Sampling Network, John B. Miller of NOAA-CMDL for helpful insights during the preparation of this manuscript, and Kimberley L. Mueller for help with figure preparation. This paper was also greatly improved as a result of suggestions offered by Ian G. Enting, Peter Rayner, and Christian Rödenbeck.