Corresponding author: F. Chevallier, Laboratoire des Sciences du Climat et de l'Environnement, CEA-CNRS-UVSQ, Gif-sur-Yvette, France. (firstname.lastname@example.org)
 Statistical modeling is at the root of CO2 atmospheric inversion systems, but few studies have focused on the quality of their assigned probability distributions. In this paper, we assess the reliability of the error models that are in input and in output of a specific CO2 atmospheric inversion system when it assimilates surface air sample measurements. We confront these error models with the mismatch between 4D simulations of CO2 and independent satellite retrievals of the total CO2 column. Taking all sources of uncertainties into account, it is shown that both prior and posterior errors are consistent with the actual departures, to the point that the theoretical error reduction brought by the surface measurements on the simulation of the Greenhouse gases Observing SATellite (GOSAT) total column measurements (15%) corresponds to the actual reduction seen over the midlatitude and tropical lands and over the tropical oceans.
 Measurements of CO2 mole fraction in the atmosphere carry the imprint of CO2 surface fluxes. However, reversing the sign of time to infer the latter based on the former is an ill-posed mathematical problem because both atmospheric mixing and sparse observation sampling make some of the flux information vanish away: a given set of measurements is consistent with an infinite number of CO2 flux maps, which would not all be judged realistic by carbon experts. The underdetermination has to be lifted with some regularization constraint, i.e., by the introduction of some prior knowledge of the flux maps. This is expressed in the most generic form with Bayes' theorem and is implemented in the atmospheric inversion systems. Since Bayes' solution to the inference problem is a probability density function, the method directly quantifies the uncertainty of the flux estimate, which is an obvious advantage compared to alternative (bottom-up) methods for flux estimation [e.g., Zaehle et al., 2005; IPCC, 2000]. This capability is rarely highlighted because the Bayesian uncertainty estimates are considered as very uncertain themselves, with a tendency toward overconfidence [e.g., Tolk et al., 2011]. Indeed, it is usually felt that there is not enough evidence to reliably fill the large covariance matrices that describe each input error component (as listed in, e.g., Engelen et al. ). Therefore, Bayesian posterior errors are often complemented by the spread of sensitivity tests [e.g., Gurney et al., 2002] when uncertainties are described. However, given the structure of Bayes' theorem, the realism of the inverted fluxes is tied to the realism of their Bayesian error bars: the credibility of the posterior errors challenges the credibility of the posterior fluxes themselves.
 In this paper, the quality of the input and output error statistics of a given atmospheric inversion [Chevallier et al., 2011] is assessed by studying their consistency with the statistics of the model departures from independent observations. This inversion ingested air sample measurements of the CO2 mole fractions, and we used independent column-averaged dry air mole fractions of CO2 (hereafter XCO2) retrieved from GOSAT over lands and oceans for the year 2010 to evaluate its error statistics. For our purpose, the GOSAT data have the advantages of covering most latitudes, hence providing general statistics and of being simulated by models with errors caused by transport inaccuracies below the part per million (ppm) level [Basu et al., 2011]. The method, the inversion system, and the independent satellite measurements are presented in sections 'Method', 'Inversion System' and 'Independent Measurements', respectively. Section 'Results' presents the results. Section 'Discussion and Conclusions' concludes the paper.
 Atmospheric inversion systems usually aim at computing the most likely state xa of gridded fields of surface fluxes, jointly called x, given a series of mole fraction observations, jointly called y, and some prior state of the surface fluxes xb. Under the assumption of unbiased Gaussian-distributed errors for y and xb and of a linear dependency between the space of the surface fluxes x and the space of the observations y, the optimal least squares estimator for x can be obtained from [e.g., Rodgers, 2000]
where B and R are the covariance matrices of the prior errors and of the observation errors, respectively, and H is a linear (or linearized) model of CO2 transport in the atmosphere multiplied with an operator that samples the model like the observations. K is called the gain matrix of the inversion system.
 The error statistics of xa are also Gaussian, and their covariance matrix can be expressed as
where I is an identity matrix.
 In contrast to xa, A does not depend on the actual observation values y and can therefore be computed before any data are acquired, provided that the measurement locations and times are known. As a consequence, A is often less trusted than xa (e.g., the discussion in Peters et al. , section S4) even though all ingredients of A are involved in the computation of xa.
 By definition, the covariance matrices A and B and their projection in observation space HBHT and HAHT are the asymptotic limit of the error statistics estimated on finite populations. Here we consider a population of observations yI that are all independent from the inversion process. If the errors of the prior xb and of the analysis xa are not correlated with the errors of the independent observations yI, the following relations apply [e.g., Desroziers et al., 2005]:
where RI is the covariance matrix of the errors of yI. RI is defined with the inversion system as a reference and therefore combines measurement errors with covariance RIM and the errors of H with covariance RIH:
 Representativeness errors are neglected (even though they may play a role around CO2 flux hot spots) because they marginally affect the zonal statistics that are studied here. The left-hand side of equations ((4)) and ((5)) involves the real measurements yI, while the right-hand side is composed of assigned error covariance matrices in the observation space. The equality occurs when the true (unknown) error statistics are used. In practice, the matrices B and R have to be replaced by estimates and , yielding an estimated gain matrix (from equation ((2)) with inputs , , and H) and a posterior error covariance matrix from equation ((3)) with inputs and ). Testing the equalities of equations ((4)) and ((5)) provides an elegant way to confront the assigned error statistics and with empirical evidence. However, we must recognize that, if observation error R is large relative to the flux errors, it obscures our ability to test the realism of , and vice versa. We also note that, following Desroziers et al. , the left-hand sides of equations ((4)) and ((5)) are mean squares (MS) and therefore may include residual biases of the independent measurements yI, while the right-hand sides are variances that describe random (but possibly correlated) errors only.
3 Inversion System
 Equations ((4)) and ((5)) are applied here to the flux inversion scheme of Chevallier et al. [2005b]. This system solves the variational optimization problem defined by equation ((1)) and estimates the matrix A of equation ((3)) by a randomization approach. The randomization method consists of building an ensemble of perturbed inversions whose inputs follow the statistics of and and whose output therefore follows the error statistics of [Chevallier et al., 2007]. It serves here to compute and in equations ((4)) and ((5)).
 We study the inversion system when it is used to invert surface air sample measurements from 91 station records over the globe for years 2009 and 2010, as described by Chevallier et al. . The reader is referred to this paper for a full description of this configuration and of the assimilated measurements. It is enough for this letter to recall that they used the global tracer transport model of Hourdin et al. , called LMDZ, as part of H, where their prior errors (the matrix in the above equations) have been designed based on experimental quantification over land [Chevallier et al., 2006] and empirical considerations over the ocean and that the observed synoptic variability at each station was taken as a proxy of the observation uncertainty (which is driven by transport modeling errors in this case), with time-correlated errors for continuous measurements implicitly taken into account in the form of inflated variances.
 One change was brought to the configuration of Chevallier et al. : the prior information for the terrestrial vegetation fluxes is now taken from a more recent version (version 22.214.171.124) of the Organizing Carbon and Hydrology in Dynamic Ecosystems model [Krinner et al., 2005] because this version describes the seasonal cycle of vegetation fluxes better and therefore better represents state-of-the-art prior information for flux inversion. The prior error covariance matrix has been consequently updated based on the method and data described in Chevallier et al. , including an account of the space-time resolution of the inverted fluxes. Over a full year, the total 1-sigma uncertainty for the prior land fluxes now amounts to about 3.0 GtC yr−1, i.e., about the current global terrestrial vegetation sink.
4 Independent Measurements
 The independent observations yI are here XCO2 measurements retrieved from the GOSAT spacecraft over the Sun-lit part of the globe. At nadir, the retrievals have a footprint diameter of about 10 km. The GOSAT mission is the foremost operational space mission dedicated to carbon. It is a joint effort of the Japanese Ministry of Environment (MOE), the National Institute for Environment Studies (NIES), and the Japan Aerospace Exploration Agency (JAXA). The spacecraft has been launched in January 2009. We use the retrievals produced by NASA's Atmospheric CO2 Observations From Space (ACOS) project in partnership with the JAXA and NIES GOSAT teams. The Bayesian retrieval algorithm directly uses a detailed radiative transfer model and has been described in O'Dell et al. . For each sounding, it yields a statistically optimal estimation of XCO2, a characterization of its specific vertical weighting (under the form of an averaging kernel) and of its uncertainty , along with other variables that influence the radiances, like surface pressure and aerosol optical depth. In this work, we utilize the latest version, build 2.10, of the ACOS/GOSAT retrieval algorithm. This version is functionally similar to the previous version 2.9 documented by O'Dell et al. , except that the aerosol formulation was changed to allow a greater ability for the retrieval to accurately fit for aerosol contamination and more completely account for cross-talk errors with carbon dioxide via path length modifications. In addition, a revised filtering and bias correction scheme has been developed specifically for version 2.10. This scheme is an extension of the approach of Wunch et al. , which characterized the errors in versions 2.8 and 2.9 of the ACOS data using a simple assumption of XCO2 spatial uniformity in the Southern Hemisphere to assess errors and biases in the retrievals. To develop postprocessing filters and a reasonable bias correction, the latest work performs an error assessment of GOSAT gain H and M data over land, as well as glint mode data over the ocean, by using not only the “Southern Hemisphere Approximation” of Wunch et al.  but also via comparisons to multiple transport models, each of which uses input fluxes optimized by assimilating surface CO2 measurements. This approach, including the filters and bias correction, will be described in more detail in an upcoming publication [O'Dell et al., paper in preparation, 2013]. In addition, soundings with a ground level higher than 2000 m above sea level, or for which the model-minus-observation ground elevation difference is larger than 1000 m, are left out for the present study because they may be poorly simulated by the global transport model. About 10,000 retrievals over land or ocean pass the screening tests each month and are used here.
 The single-sounding measurement errors, represented by , are taken directly as the posterior XCO2 error from the Bayesian retrieval and are a combination of instrument noise, interference, and smoothing error [Connor et al., 2008]. They have not been artificially inflated nor modified to account for the effects of the bias correction. Correlations of the measurement errors (implying regional biases) between different soundings have not been quantified but are likely to be substantial despite the bias correction, because retrieval biases related to aerosol, surface brightness, and topography are likely to be spatially and/or temporally correlated. Because they are not presently quantified, the off-diagonal terms of equations (4) and (5) will not be studied here nor will aggregates of retrievals be formed to study the error statistics at scales coarser than individual soundings. The statistics of the uncertainty of the transport model LMDZ in the simulation of XCO2, characterized here by , is taken from the statistics of the differences between two simulations of GOSAT retrievals using two different transport models [Chevallier et al., 2010]: the corresponding standard deviations are about 0.5 ppm.
 Results are gathered for the year 2010. Distinction is made between the lands and the oceans north of 20°N (respectively referred to as LN20N and SN20N in the following), the lands and the oceans south of 20°S (called LS20S and SS20N, respectively), and the lands and the oceans between 20°S and 20°N (LTROP and STROP). Since we cannot distinguish between random errors and systematic ones (see previous paragraph), and following the usual practice [e.g., Desroziers et al., 2005], we use the root mean square (RMS) to characterize the statistics of the model-minus-observation departures, rather than the standard deviation.
 The root mean square of the prior departures is shown in Figure 1a for the six zones: they range between about 1.3 and 1.9 ppm, the largest value being for LN20N. For a single year, like here, the RMS hardly differs from the standard deviation (not shown). The mean square (MS) of the prior departures (shown in pink in Figure 1b) represents the left-hand side of equation (4). The components of the corresponding right-hand side are shown in Figure 1b as a stacked histogram. The component coming from the uncertain surface fluxes, , have RMS values of about 1.0 ppm (between 0.8 in SS20S and 1.2 ppm in LTROP). The RMS of the component from the model uncertainty, , is about 0.5 ppm. The quadratic means of the retrieval posterior error, for single soundings, are about 1.4 ppm over land and 1.2 ppm over ocean. Consistent with the initial hypotheses (section 'Method'), all statistics on the right-hand side are unbiased, so that their RMS equals their standard deviation and their MS is their variance. Unsurprisingly, the variance equality of equation ((4) is not strictly achieved (Figure 1b), but a fair agreement is seen for all regions studied, with variance differences of less than 0.8 ppm2, the assigned error statistics being conservatively pessimistic.
 Figure 2a displays the corresponding RMS of the posterior departures (i.e., after the assimilation of surface air sample measurements): the values range between 1.1 ppm (STROP) and 1.5 ppm (LN20N). They are in fair agreement with the assigned error statistics that differ by less than 0.2 ppm. The values (Figure 2b) correspond to standard deviations between 0.2 ppm (SN20N) and 0.6 ppm (LTROP). They constitute a small part of the posterior departure error budget that is dominated by the retrieval errors (Figure 2b).
 To evaluate the performance of an observation system, it is usual to compute error reductions, defined as (1 − σa/σb) × 100, in %, where σa is the posterior error standard deviation and σb is the prior error standard deviation for a given target quantity such as carbon fluxes [e.g., Hungershöfer et al., 2010]. Considering the total columns measured by GOSAT and including the retrieval and transport model errors, our results lead to theoretical error reductions in predicted GOSAT XCO2, brought by the surface network, between 14% (LN20N) and 17% (STROP). The error reduction seen in the actual mismatches is slightly larger, between 16% and 18%, except in the midlatitude oceans where no error reduction is seen in practice. In the midlatitude oceans, the assigned prior error statistics appear to be overestimated, while the posterior errors are about right.
6 Discussion and Conclusions
 There is a need to strengthen the statistical rigor of atmospheric inversions because their setup usually includes a fair level of empiricism that affects the performance of the method. To this end, the Bayesian framework provides a paradigm against which actual systems can be benchmarked. In particular, perfect systems satisfy a series of statistical properties [e.g., Desroziers et al., 2005] that can, at the minimum, contribute to evaluating them and may even help tuning the assigned error statistics [Michalak et al., 2005; Winiarek et al., 2012]. In the present paper, we have evaluated the consistency between the statistics of model-minus-observations departures and the carbon flux error statistics that have been assigned in an atmospheric inversion assimilating surface air sample measurements. The departures have been computed from independent measurements of the CO2 total column retrieved from GOSAT. The detailed structure of the prior error covariance matrix cannot be seen from such measurements because atmospheric transport blurs it, but a broad picture of its realism is obtained. We have seen that the RMS of the prior departures is less than 2.0 ppm, a value that gives an upper boundary for the prior flux errors when projected in the space of XCO2. The assigned uncertainty for these prior flux errors actually varies between 0.9 and 1.2 ppm when projected in the space of XCO2. When summed with retrieval error statistics and transport model error statistics, the model uncertainty nears the actual error budget, with a slight overestimation, which demonstrates the realism of . Note that the lack of vertical resolution of the GOSAT XCO2 data (accounted for in H) only allows evaluating very broad features of without distinguishing between variances and correlations in this matrix. Ideally, we would like to focus on the uncertainty within the boundary layer in order to study the flux errors at higher resolution, but note that the transport model errors RIM would interfere more in this case.
 Incidentally, the statistics of the prior model-minus-observations departures also provide an indication about the level of observation bias that the inversion can tolerate when assimilating these observations [Chevallier et al., 2005a]. Indeed, equation ((1) shows that biases in the observations drive the analysis if they are commensurate with the prior departures. Hence, biases should ideally be much smaller (e.g., tenfold smaller) than the departure statistics. This rule of thumb indicates that keeping the retrieval XCO2 systematic errors within a couple of tenths of a ppm is important for flux inversion using these data. The near closure of the posterior error budget suggests that the residual biases of ACOS retrievals, which have been already empirically corrected from gross biases, are not far from this target.
 The RMSs of the departures after assimilation of surface air sample measurements were shown to be less than 1.6 ppm and are marginally larger than the retrieval errors. They fairly agree with the theoretical error statistics that top them by a few tenths of ppm only. This systematic agreement in the six regions studied indicates that the inversion system correctly estimates that posterior uncertainties become negligible compared to the single-sounding retrieval errors and that the retrieval errors are fairly represented. The level of uncertainty of the GOSAT XCO2 data does not allow assessing the quality of further (in particular, a possible underestimation or overestimation by twofold can hardly be noticed), but these data provide a first factual evidence of its realism. The fact that the uncertainty of the simulated XCO2 field becomes negligible compared to retrieval errors, even over tropical lands, implies a high challenge for the current GOSAT data to complement the information provided by the current surface network. Finally, we note that the skewness of the posterior departures (which is dominated by retrieval errors) is less than 0.4 for the six regions defined in the paper, which suggests that modeling the retrieval errors by a normal distribution is a fair assumption.
 We have seen that the assimilation of surface air sample observations does improve the simulation of the total column. The inversion system fairly represents the actual values of the uncertainty reduction for the simulation of the GOSAT data (16–18%), even slightly underestimating them, except in the midlatitude oceans where no error reduction is seen in practice. Based on the GOSAT data only, it is not possible to identify what, in the inversion setup, prevents the error reduction over the midlatitude oceans; it could be too large prior flux errors assigned over the midlatitude oceans or transport inaccuracies between these oceans and the surface stations, which would not be well reflected in the surface observation errors. The good agreement obtained over the rest of the globe without any tuning shows that an inversion system can, in principle, yield reliable error statistics at the large flux scale that corresponds to the imprint of column measurements even within a Gaussian linear framework and therefore fairly diagnose its strengths and weaknesses. Finally, we argue that consistency checks of the inversion error statistics, like those presented here, should be part of the standard evaluation toolbox of atmospheric inversions. We suggest that this error analysis framework could be usefully applied to most carbon flux inversion systems which are confronted with multiple data sources.
 The GOSAT team at JAXA/NIES/MOE provided the GOSAT TANSO-FTS level 1B data product input to the ACOS level 2 production process. The authors would like to acknowledge the work of the NASA ACOS project to create the XCO2 data and especially to acknowledge Paul Wennberg and David Crisp for their leadership in facilitating the NASA-GOSAT collaboration that underlies the ACOS project. This work was performed using HPC resources from DSM-CCRT and [CCRT/CINES/IDRIS] under the allocation 2012-t2012012201 made by GENCI (Grand Equipement National de Calcul Intensif). It was cofunded by the European Space Agency under the GHG-CCI project and the European Commission under the EU Seventh Research Framework Programme (grant agreement 283576, MACC II). CO is funded under a subcontract through the NASA Jet Propulsion Laboratory.