In any data assimilation framework, the background error covariance statistics play the critical role of filtering the observed information and determining the quality of the analysis. For atmospheric CO2 data assimilation, however, the background errors cannot be prescribed via traditional forecast or ensemble-based techniques as these fail to account for the uncertainties in the carbon emissions and uptake, or for the errors associated with the CO2 transport model. We propose an approach where the differences between two modeled CO2 concentration fields, based on different but plausible CO2 flux distributions and atmospheric transport models, are used as a proxy for the statistics of the background errors. The resulting error statistics: (1) vary regionally and seasonally to better capture the uncertainty in the background CO2 field, and (2) have a positive impact on the analysis estimates by allowing observations to adjust predictions over large areas. A state-of-the-art four-dimensional variational (4D-VAR) system developed at the European Centre for Medium-Range Weather Forecasts (ECMWF) is used to illustrate the impact of the proposed approach for characterizing background error statistics on atmospheric CO2 concentration estimates. Observations from the Greenhouse gases Observing SATellite “IBUKI” (GOSAT) are assimilated into the ECMWF 4D-VAR system along with meteorological variables, using both the new error statistics and those based on a traditional forecast-based technique. Evaluation of the four-dimensional CO2 fields against independent CO2 observations confirms that the performance of the data assimilation system improves substantially in the summer, when significant variability and uncertainty in the fluxes are present.
 Atmospheric CO2 observations provide a powerful constraint on net sources and sinks of CO2, as well as their spatial and temporal distribution [e.g., Le Quéré et al., 2009]. The advent of several satellite-based instruments for observing CO2 is expected to provide better insight into the critical controls over atmospheric CO2 growth [e.g., Scholes et al., 2009] and improve atmospheric flux inversions [e.g., Chevallier et al., 2009b; Baker et al., 2010]. Assessing the information content of remote-sensing data sets, however, is difficult due to sampling limitations, data gaps, and incomplete error characterization in the observations. These data gaps can be caused by geophysical limitations, such as clouds and aerosols [e.g., Bösch et al., 2006; Tiwari et al., 2006], as well as by retrieval uncertainties. Further analysis in the form of statistical mapping [e.g., Alkhaled et al., 2008; Hammerling et al., 2012a, 2012b] or binning and averaging [e.g., Tiwari et al., 2006; Crevoisier et al., 2009; Kulawik et al., 2010] is often necessary in order to create full-coverage maps and to reduce the errors associated with the data, thereby maximizing the usefulness of the data.
 An alternate approach is to use data assimilation (DA) techniques to extract information about global atmospheric CO2 distributions from the available observations. Atmospheric CO2 data assimilation [e.g., Engelen et al., 2004; Engelen and McNally, 2005; Engelen et al., 2009; Liu et al., 2012] yields four-dimensional fields of atmospheric CO2 concentrations that are statistically consistent with (1) the information provided by the CO2 observations, and (2) additional sources of information such as model estimates of CO2 fluxes, atmospheric transport, etc. These additional sources of information, commonly termed as the a priori model state (or the background state), are typically obtained from a short-range forecast valid at the time of assimilation. The background state incorporates knowledge about the mechanistic processes governing the carbon cycle and/or imposes physical or dynamical constraints on the assimilation. An attractive feature of the DA framework [e.g., Engelen et al., 2009] is that, along with CO2 observations, it is also possible to assimilate relevant meteorological variables such as temperature and humidity, which are known to affect the observed radiances from which CO2 information is derived. The final analysis is a consistent estimate of the atmospheric CO2 concentrations which, if produced with sufficiently high accuracy, can be used to improve estimates of CO2 sources and sinks [e.g., Chevallier et al., 2009a].
 In all DA applications, ranging from atmospheric CO2 or other trace gases to Numerical Weather Prediction (NWP), the observations and the background are weighted [e.g., Nichols, 2010] based on the accuracy of the information sources. Like any other information source, the background state is prone to errors, which are accounted for through the so-called background error covariance statistics [e.g., Bannister, 2008a]. It is now well accepted that a realistic representation of the background error distribution is very important for successful data assimilation. The background error statistics serve to filter and spatially spread the information provided by the observations, and to impose correlation among different model variables. The critical role played by the background has been demonstrated by Cardinali et al. , for example, where it was shown that only 15% of the information content of a well-balanced meteorological analysis is attributable to the assimilated observations; the remaining 85% of the information is provided by the background.
 A conceptual definition of background error is relatively trivial, in the sense that it corresponds to the statistical difference between the background atmospheric state and the true atmospheric state. Nonetheless, realistic estimation of the error statistics is not straightforward for several reasons. First, the true atmospheric state is never exactly known, such that the background errors and associated covariances must be estimated from surrogate data. A second difficulty is that error contributions to the background are relatively intricate, because the background is the result of a complex data assimilation procedure that involves interplay between the observations assimilated in the past, the analysis formulation, and a forecast model operator. A third difficulty is that the size of the full background error covariance matrix is too large to be stored explicitly, necessitating a variety of reduced-rank approximations to make the computations feasible.
 Various techniques have been developed to tackle these challenges [e.g., Bannister, 2008a, 2008b], and there is now a substantial literature on background error statistics for NWP, covering their nature, estimation, and practical implementation in operational settings [e.g., Derber and Bouttier, 1999; Fisher, 2004, 2006; Belo Pereira and Berre, 2006; Buehner and Charron, 2007; Pannekoucke et al., 2007, 2008; Raynaud et al., 2009; Hess, 2010; Bonavita et al., 2011; Brousseau et al., 2012]. Similarly, there is an increasing focus on estimation of background error statistics for constituent assimilation applications [e.g., Benedetti and Fisher, 2007; Chai et al., 2007; Constantinescu et al., 2007; Kahnert, 2008; Singh et al., 2011; Massart et al., 2012], which requires a somewhat different approach than the NWP problem. This is primarily because time-varying boundary values are optimized instead of an initial condition (i.e., the atmospheric state) at the start of a relatively short assimilation window. More recently, an increasing number of studies [e.g., Chai et al., 2007; Buehner and Charron, 2007; Berre and Desroziers, 2010; Singh et al., 2011] have demonstrated that, for both NWP and constituent assimilation applications, it is crucial to take into account the spatial correlations in the background errors, and therefore use nondiagonal error covariance matrices to represent these errors.
 Existing forecast or ensemble-based techniques, however, are problematic for atmospheric CO2 data assimilation, as they are unable to explicitly take into account the significant uncertainty associated with the surface CO2 fluxes. Spatial and temporal variations in the CO2 fluxes are a key driver behind atmospheric CO2 distributions, and consequently the measured CO2 concentrations. Additionally, the errors associated with the CO2 transport model are not adequately characterized in the majority of these approaches. The prescribed background error statistics are based only on the internal uncertainties in the transport model, which are not a realistic representation of the true transport model errors. Failure to capture the true magnitude of the transport errors and/or the uncertainty of the fluxes results in an underestimation of the background error statistics, which makes the assimilation devalue the observations in favor of the background. This underestimation can be severe, for example, during the Northern Hemisphere summer when the uncertainties in the background CO2 fluxes are typically high. One solution to this problem is to artificially inflate the background error [e.g., Engelen et al., 2009], where the magnitude of the inflation factor may be determined based on comparisons between the CO2 model concentrations and independent surface observations. This strategy still does not account for the spatial error correlations due to the flux uncertainties, however.
 The primary goal of this paper is to outline and test an approach for parameterizing the background error covariance for atmospheric CO2 data assimilation, in a way that includes the statistics of errors resulting from both flux and transport uncertainties. The proposed approach is based on the assumption that the difference between total CO2 concentrations (henceforth, denoted as ΔCO2) from two state-of-the-art global models is statistically representative of the background errors. Because any two models provide a limited sample of the true background error distribution, it is beneficial to use models that are different both in terms of the underlying fluxes and the transport fields driving them to capture realistic variability in the background error. The error statistics are then generated from the ΔCO2 fields using spatial statistical tools.
 In this study, the resulting background error statistics are implemented within the atmospheric CO2 4D-VAR system at ECMWF. This system, as described in Engelen et al. , is akin to a NWP-DA setup, in which CO2 mixing ratios are constrained along with other atmospheric variables such as temperature, winds, surface pressure, and humidity, to obtain a consistent estimate of the atmospheric CO2 concentrations. Experiments are designed to evaluate (a) whether the difference between two models can be used as a proxy for the statistics of the background errors, (b) the extent to which the representation of background errors has a discernible impact on CO2 estimates, and (c) whether including more realistic statistics of errors improves the performance of the data assimilation system, relative to simply accounting for internal transport model uncertainties as available from a forecast-based technique (standard version of the NMC method, according to the terminology of Široká et al. ). The predictions of the 4D-VAR analyses are evaluated using independent observations of CO2 from aircraft profiles and observations of column-averaged dry mole fractions of CO2 (i.e., XCO2) from the Total Column Carbon Observing Network (TCCON).
 Although the application presented here primarily focuses on the operational ECMWF 4D-VAR system, the proposed Δ-statistics approach is also applicable to ensemble data assimilation systems. Within an ensemble system, the Δ-statistics approach may be used to define an initial background state, which would be refined as the background state of each ensemble member is perturbed further during the DA process. Additionally, this study focuses on atmospheric CO2 data assimilation, but the Δ-statistics approach is relevant for other trace gas assimilation applications, especially ones in which the background errors are influenced by both atmospheric transport and emission patterns.
2 Experimental Framework
2.1 Four-Dimensional Variational Data Assimilation
 The atmospheric 4D-VAR data assimilation system used in this study is based on the ECMWF Integrated Forecasting System (IFS) transport model with CO2 fluxes prescribed at the surface based on climatological and inventory data [e.g., Engelen et al., 2009; Hollingsworth et al., 2008]. The system assimilates the same meteorological observations as the operational ECMWF system, along with observations constraining CO2.
 In addition to the specification of the background covariance matrix, which forms the primary focus of this work (section 2.2), the data assimilation system here includes two major changes relative to Engelen et al. . First, the IFS version has been updated, and the version used in this study is based on CY37r3 (ECMWF, IFS documentation CY37r2, 2011, http://www.ecmwf.int/research/ifsdocs/CY37r2/index.html), which became operational in November 2011 and incorporates several improvements to the atmospheric forecast model and the data assimilation system at ECMWF. Second, while the focus in Engelen et al.  was on the assimilation of radiances from the Atmospheric Infrared Sounder (AIRS) [Aumann et al., 2003] and the Infrared Atmospheric Sounding Interferometer (IASI) [Siméoni et al., 1997], the current study uses Level 2 retrieval data from the Greenhouse Gases Observing Satellite “IBUKI” (GOSAT) [Kuze et al., 2009; Yokota et al., 2009], which are obtained from version 2.9 of the ACOS algorithm [e.g., O'Dell et al., 2012; Crisp et al., 2012]. Based on the recommendations of the ACOS team, only the high (H) gain observations with the master quality flag equal to “good” are used in the assimilation process, with the observation errors set to 1% of the observed value. Although first estimates for bias correction of the ACOS data do exist [Wunch et al., 2011a], these are not used in the experiments in this study. The GOSAT data provide a stronger constraint on CO2 near the surface relative to AIRS and IASI observations, but have relatively poor geographical coverage.
2.2 Specification of the Background Error Statistics
2.2.1 NMC Method
 The background error covariance matrix used in Engelen et al.  is based on the NMC (National Meteorological Center, nowadays named National Center for Environmental Prediction) method (Parrish and Derber ). The NMC method is based on the principle of using a surrogate quantity to represent the background errors, where the surrogate is typically chosen to be the differences between forecasts of different lengths valid at the same time. In its simplest form [e.g., Široká et al., 2003], the NMC method is implemented by taking the differences between 24 h forecasts and 12 h forecasts over a 1 month period. The main advantage of this method is that the forecasts required for calculating the background error statistics are readily available during the DA process.
 One limitation common to all applications of the NMC method, is that the variance of background errors in data sparse regions is underestimated [e.g., Berre et al., 2006] because differences between forecasts of different lengths are partially attributable to information from observations within the period between the starting times of the two forecasts. A second limitation, specifically for atmospheric CO2 data assimilation, is that this approach does not account for the uncertainty associated with the underlying fluxes, because the two forecasts derive from the same set of prescribed fluxes. The NMC method is able to account for internal (i.e., within the same model) transport uncertainties only, and cannot represent systematic errors due to the transport model itself. The formalism of the NMC method (i.e., the analysis of increments) emphasizes the small-scale structures in the background CO2, missing much of the large-scale error patterns due to the dominant flux errors. Interestingly, this is different from NWP-related applications, where the observation errors are the main driver for the 6–12 h background forecast errors, and it has been pointed out [e.g., Berre et al., 2006; Belo Pereira and Berre, 2006; Storto and Randriamampianina, 2010] that the NMC method likely overestimates the error correlations in that context. Finally, if there is no significant seasonality in the CO2 observational constraint, then the error statistics defined via the NMC method remain invariant in time. Because the NMC statistics are only based on the internal (i.e., within model) transport uncertainties, they exhibit very little seasonal variability in the background errors.
 While there may be ways to extend the NMC method to take into account flux errors and seasonal variability in internal transport model uncertainties, these are neither straightforward to implement nor will account for the spatial correlation in the errors. Advances in the NMC method may also fail to take advantage of the fact that the forecasts can be obtained directly from existing runs; hence, the computational expense of generating the background error statistics may increase significantly.
2.2.2 Δ-Statistics Method
 Analogous to the NMC method, the Δ-statistics is formulated using a surrogate quantity, which is chosen to be the difference in the modeled CO2 concentrations (ΔCO2) from two global models. In this study, the two selected models are: (1) the GSFC Parameterized Chemistry and Transport Model (PCTM) [Kawa et al., 2004] driven by analyzed meteorological fields from NASA's Goddard Earth Observation System, version 4 (GEOS-4) with terrestrial biospheric sources and sinks based on computations of net primary productivity from the Carnegie-Ames Stanford Approach (CASA) [Randerson et al., 1997] model and (2) the ECMWF Integrated Forecasting System (IFS) (http://www.ecmwf.int/research/ifsdocs/CY37r2/index.html) model with the terrestrial biospheric fluxes prescribed from the Organizing Carbon and Hydrology In Dynamic Ecosystems (ORCHIDEE) [Krinner et al., 2005] model. Specifications for the other flux components (ocean, anthropogenic, and wildfire/biomass burning) are outlined in Table S1 in the supporting information. The main difference between the underlying fluxes is in the biospheric (CASA versus ORCHIDEE) and the anthropogenic (modeled versus compiled) CO2 flux distributions, which, when propagated by different atmospheric transport (PCTM-GEOS4 versus IFS), result in two distinct specifications of CO2 concentrations. Differences between the biospheric flux models have been reported in the literature [e.g., Xueref-Remy et al., 2011; Huntzinger et al., 2012], and we refer the reader to these studies for a more comprehensive overview.
 The differences between the two modeled CO2 concentration fields (ΔCO2; Figure 1) are obtained 3-hourly at a horizontal resolution of 1° × 1.25° for 60 sigma-hybrid levels of the atmosphere from which the horizontal and vertical error correlations are generated. Note that the two modeled CO2 concentration fields are initialized using model-specific conditions, and these represent two independent long-term runs, which are not reinitialized during the analysis period. Global differences between the model runs are eliminated by subtracting any global offset between the models prior to the analysis. In this way, the error statistics used in the analysis represent long-term, rather than short-term, differences between the models and are likely to be conservative as they do not account for the reduction in background errors resulting from the assimilation of atmospheric data. By using conservative background error statistics, the approach allows the assimilation to put more weight on the observations.
 The vertical error correlations distribute the information of the observations in the vertical under the assumption that an error at a specific height is correlated with errors at other heights. The vertical error correlations are obtained by calculating the correlation coefficients of ΔCO2 between different atmospheric (or model) levels. These vertical error correlations are constant in space but vary monthly to capture the seasonality in the ΔCO2 field. The technique can be extended to consider different vertical error correlations for different geographic areas. Within a DA framework, both the vertical background error correlations and an averaging kernel constrain the model profile. In this study, only the vertical error correlations are obtained using the Δ-statistics (or the NMC) method, while the averaging kernel is obtained from the GOSAT-ACOS data.
 The horizontal error correlations define how errors are correlated between grid boxes, and define the degree to which CO2 is adjusted around grid boxes containing observations. The horizontal error correlations are obtained from a variogram analysis, a quantitative tool in geostatistics that has previously been used to characterize the spatial and temporal structure of atmospheric CO2 [e.g., Alkhaled et al., 2008; Hammerling et al., 2012a]. The horizontal correlations are themselves spatially variable and are defined separately for each model level.
 Separate horizontal error statistics are calculated for each model level due to the difference in the patterns of CO2 gradients at different levels of the atmosphere (Figure 1). Near the surface (Figures 1a and 1b), uncertainties in the prescribed fluxes result in higher errors in the background. In addition, the interaction between boundary layer dynamics and biospheric emissions (also known as “the rectifier effect,” Denning et al. ) contributes to large variability in the CO2 concentrations near the surface. Conversely, in the free troposphere (Figures 1c and 1d), CO2 is more dispersed and well-mixed, yielding smaller and smoother errors that are primarily impacted by the variability in atmospheric transport. Figure 1 shows the ΔCO2 fields for two typical months—January and June 2010, which are representative of Northern Hemispheric winter and summer, respectively. As can be seen in this figure, the seasonality in the ΔCO2 fields is more evident near the surface (Figures 1a and 1b) relative to higher levels in the atmosphere (Figures 1c and 1d).
 The plots in Figure 1 also show that within each model level, significant regional variability exists in the ΔCO2 fields. These reflect the regional differences in surface fluxes between the models, as well as differences due to different representations of global atmospheric transport. Previous work by Alkhaled et al.  and Hammerling et al. [2012a] using column-averaged CO2 concentrations (i.e., XCO2) highlighted that significant regional variability exists in the spatial covariance structure of atmospheric CO2 concentrations, and hence, any spatial analysis needs to be done regionally rather than globally. Keeping in mind that the ΔCO2 field is a nonstationary process with variances and/or correlation lengths varying in space and time, we use a “moving window” analysis to calculate the horizontal error correlations.
 At any given model level, the regional variability of ΔCO2 is quantified using the semivariogram (γregion(d)), as
where, similar to Alkhaled et al. : (a) regions are defined as overlapping 2000 km radius circles centered at each model grid cell, and (b) γregion(d) is constructed using pairs of points, with the first point being within the specified region (ΔCO2 (xregion)) and the other point being either within or outside that region (ΔCO2 (xregion + d)). This approach accounts for both the observed variability within each subregion (by using all available pairs of points within a region) and large-scale variability (by using a random sample of the points outside the region). Once the regional variability of ΔCO2 is defined, an exponential variogram model (γtheo(d)) is selected to represent the spatial autocorrelation structure. This model captures the decay in the spatial correlation as a function of separation distance (d), parameterized by a variance (σ2) and a range (l) parameter (equation (2)). While a variety of approaches has been suggested in the literature [e.g., Chiles and Delfiner, 2012] for fitting these parameters, we use the limited memory BFGS (L-BFGS) [Nocedal and Wright, 2006] minimization scheme for its computational efficiency. The horizontal error correlation coefficients are then obtained as the ratio of Q(d)/ σ2, where
 The choice of a simple isotropic exponential decay model is based on the nature of the underlying physical process and on the examination of the behavior of the variability, especially at smaller separation distances. Although a more complex variogram model might be justified in areas with strong boundaries (e.g., near coasts), we err on the side of parsimony and keep the variogram model uniform while allowing its parameters to vary in space and time.
 The horizontal and vertical error correlations are generated for 1 day each month (on the 15th day at 1800 h UTC), and are assumed to be representative of the typical variability that would be observed during any individual day in the month. Ideally, the error correlations should be generated for all time periods (i.e., 3 hourly) within a month when the modeled CO2 concentrations are available. This would avoid any form of spatial and/or temporal bias arising due to the way synoptic variability is handled in the individual models. At daily or subdaily time scales, however, any biases resulting from the selection of a particular set of models with their inherent physical processes, atmospheric transport patterns, etc., are mostly local and remain confined within the boundary layer. The error statistics did not vary substantially for different times of the day (for example, 0600 h UTC and 1800 h UTC) and for different days within a month. The decision to generate the error correlations once a month is thus primarily based on the tradeoff between reliably capturing monthly (or seasonal) variations in the background error and the computational time required to generate the error statistics. Ultimately, the appropriateness of this choice can be assessed by examining the fit between the assimilated CO2 fields and independent data (section 4.3).
 The size of the background error covariance matrix is typically too large to be used directly within the ECMWF 4D-VAR system. Hence, the properties of the background error covariance inferred from the Δ-statistics are subsequently modeled using the mathematical framework of wavelet-like, nonorthogonal basis functions having simultaneous localization, both in space and wave number. This mathematical formulation of the background error covariance matrix is based on Fisher [2004, 2006], and the application to tracer variables is also described in Benedetti and Fisher .
3 Sample Application
 GOSAT observations for the year 2010 are assimilated into the 4D-VAR system to generate global 4D distributions of atmospheric CO2. Two independent sets of experiments are run, in which the background error covariance is prescribed based on the Δ-statistics (henceforth “analysis with Δ-statistics”) and the NMC approach (henceforth “analysis with NMC statistics”), respectively. While the background error using the Δ-statistics method is defined exactly as outlined in section 2.2, the NMC method typically underestimates the background error. In order to offset this underestimation, following Engelen et al. , the standard deviations of the background errors from the NMC method are inflated by a predetermined factor. The magnitude of the inflation factor is determined based on comparisons between the CO2 model concentrations and independent surface and aircraft observations. All other parameters in these experiments are held the same to allow a straightforward evaluation of the impact on the CO2 estimates due to different parameterization of the background error statistics.
 A third set of experiments is run in which fluxes are prescribed but no observational constraint on CO2 is provided (henceforth “unconstrained model run”), to assess the impact of the GOSAT observations on the assimilation, and the impact of each of the background error representations. This experiment assimilates only meteorological observations and transports the CO2 starting from the same initial field on January 1st 2010 as the other two sets of experiments.
3.2 Evaluation of the 4D CO2 Fields From the Experiments
 Given that the true atmospheric CO2 concentrations are unknown globally, the 4D-VAR estimates from the three experiments are evaluated using two sets of observations (Figure 2) that are not included in the assimilation process, namely column-averaged dry mole fractions of CO2 (i.e., XCO2) from the Total Column Carbon Observing Network (TCCON) and vertical profiles of CO2 from aircraft observations. Both of these observational data sets have much higher accuracy than the 4D-VAR analysis, and are assumed to be representative of the true atmospheric state.
 TCCON is a global network of calibrated ground-based Fourier transform spectrometers that measure the total column amount of various species (including CO2) by recording the direct solar spectra in the near-infrared spectral region [Wunch et al., 2011b]. The TCCON observations are compared to cosampled column-averaged 4D-VAR analysis estimates within six latitudinal bands (90°–60°N, 60°–30°N, 30°–0°N, 0°–30°S, 30°–60°S, and 60°–90°S) to assess the impact of latitudinal differences in the background error statistics on the analyses. Because the GOSAT-ACOS XCO2 data have already been calibrated using TCCON data, this comparison primarily evaluates the degree to which the ∆-statistics approach improves the assimilation of GOSAT observations.
 Mean absolute errors (MAE) [e.g., Willmott and Matsuura, 2005] are calculated across all TCCON sites within each latitudinal band, as is the standard error of the mean:
where M represents the total number of TCCON sites, Ni represents the total number of observations at the ith TCCON site, is the jth TCCON observation at the ith site, and is the corresponding 4D-VAR analysis value for that TCCON observation. Additionally, the MAE for each TCCON site is separately calculated as
 A second evaluation is carried out using aircraft observations available over North America. The National Oceanic and Atmospheric Administration Earth System Research Laboratory (NOAA-ESRL) has been conducting long-term aircraft monitoring (http://www.esrl.noaa.gov/gmd/ccgg/aircraft/), in which vertical profiles of various trace gases (including CO2) are observed in the troposphere (i.e., surface to 8 km altitude) with high accuracy [e.g., Tans, 1996; Crevoisier et al., 2010]. Sampling frequencies are weekly or biweekly for most sites. The aircraft observations and the sampled 4D-VAR analysis estimates are divided into four altitude bins, 0–2, 2–4, 4–6, and 6–8 km. For each altitude bin, equations equivalent to equations (3)–(5) are applied to calculate the MAE metrics between the 4D-VAR analysis estimates and the aircraft observations. This evaluation exercise provides quantitative information regarding the impact of the background error statistics in the vertical direction. Because the aircraft data have not been used to calibrate the GOSAT data and were not assimilated, this comparison primarily evaluates whether the ∆-statistics approach yields more realistic atmospheric CO2 fields. Finally, because aircraft observations are available only over North America for the study period, only a single latitude band from 90° to 0°N is considered for the aircraft evaluations.
 As the GOSAT XCO2 retrievals mature and atmospheric CO2 fields are obtained for more years, similar evaluation exercises using a variety of aircraft observations (e.g., CARIBIC—Civil Aircraft for the Regular Investigation of the Atmosphere Based on an Instrument Container, [Brenninkmeijer et al., 2007]; CONTRAIL—Comprehensive Observation Network for Trace gases by Airliner, [Machida et al., 2008]; HIPPO—HIAPER Pole-to-Pole Observations, [Wofsy, 2011]) are planned. These data sets have the additional advantage that they are available beyond the North American domain. Some of them also provide upper tropospheric data, which will allow for rigorous evaluation with these independent data at higher levels of the atmosphere.
4.1 Background Errors From the Δ-Statistics Approach
 Figure 3 shows the horizontal error correlation length and the variance for the near-surface ΔCO2 fields shown in Figures 1a and 1b. The covariance parameters capture both regional and seasonal differences as seen in the ΔCO2 fields. In June, strong biospheric signals dominate the Northern Hemisphere flux uncertainty, and variability is generally high relative to January. Consequently, the horizontal error correlations in Figures 3b and 3d show higher variance and shorter correlation lengths over the Northern Hemisphere relative to those presented in Figures 3a and 3c. The spatial patterns detected by the Δ-statistics include large regions with relatively low variances of below 2.5 ppm2 during the Northern Hemisphere winter (January) and high variances reaching a maximum of 25 ppm2 during the Northern Hemisphere summer (June). In June, regions with highly variable surface fluxes (for example, over Boreal forests—50° to 70°N, Figures 3b and 3d) also correspond to regions with large differences between models, and yield high estimated background errors, with variances of up to 25 ppm2 and short correlation lengths (3l) below 5000 km. Expectedly, analyses at higher model levels (Figure S1 in supporting information) show longer correlation lengths and lower variances relative to the surface. For example, over the Boreal forests, the variances reduce to 0.3 ppm2 with long correlation lengths (3l) of 15,000 km at 20.0 km elevation. Higher in the atmosphere, the impact of the near-surface CO2 exchange is dampened, and the variability is dominated exclusively by the synoptic-scale mixing of air masses.
 During the Northern Hemisphere winter, and at higher levels of the atmosphere, differences between the ΔCO2 fields are more affected by differences in the atmospheric transport rather than the uncertainty in the surface fluxes. This is reflected in the correlation length plots (Figure 3a and Figure S1a and S1b in the supporting information) where “streaks” with especially high or low correlation lengths are evident, corresponding to atmospheric transport pathway differences between the models [e.g., Stohl et al., 2002]. These patterns in the ΔCO2 field may also be a product of the different ways in which synoptic systems, for example, convection and turbulence associated with frontal activity, are being handled by the two transport models. In a separate set of tests, the horizontal error correlations were calculated for ΔCO2 fields from different days and different times. The overall structure and pattern of the horizontal error correlation values were found to be similar (results not shown) to those reported in Figure 3 (and Figure S1); hence, the patterns in the horizontal error correlations can be assumed to be representative of the variability in the atmospheric transport rather than artifacts due to synoptic variability.
 The vertical error correlations (Figure 4) based on the ΔCO2 fields show distinct seasonality and decay more gradually across model levels (Figures 4a and 4b—between 10 and 0 km) relative to those derived from the NMC method. This gradual decay implies that the analysis will spread the information from local observations more in the vertical relative to the analysis using the NMC statistics. Generally speaking, the vertical error correlations from the Δ-statistics by design tend to capture features associated with the uncertainty in atmospheric transport better than the error characteristics using the NMC statistics. During the month of June, for example, the error correlations between the surface and higher levels derived from the Δ-statistics drop sharply to negative values around 10–12 km, which coincides with the location of the tropopause. While the exact causes for these negative correlations remain unclear, reasons related to the uncertainty in the atmospheric transport can be hypothesized.
 First, examination of the CO2 fields shows strong negative gradients at these atmospheric levels (~10–12 km), especially over the extra-tropical regions. The role of the tropopause as a barrier to isentropic mixing is well-known [e.g., Holton et al., 1995; Andrews et al., 2001; Bönisch et al., 2009], and most likely causes negative gradients in the CO2 concentrations to develop, while also damping the amplitude of the seasonal cycle of CO2 [e.g., Gurk et al., 2008]. When the gradients or amplitude of the CO2 concentrations near the tropopause are not simulated similarly between the two models, large errors (i.e., differences in model) are inferred relative to the other atmospheric levels. Second, if the two transport models predict different tropopause heights, this may combine with the steep gradients at this level and result in relatively large differences. If these differences are systematic globally, it could create negative correlations with the surface, which is coupled more tightly with the upper troposphere than the lower stratosphere. Finally, the tropopause errors could be anti-correlated with errors at the surface as a result of local convective vertical transport (pumping) differing between the two transport models. Beyond the tropopause, however, large-scale meridional stratospheric circulation dominates, which is simulated similarly by both the models, resulting in a gradual decrease in the inferred errors.
 The error statistics also provide an initial indication of the observational impact on the analysis. The long ΔCO2 correlation lengths are a result of the large-scale errors in the CO2 background, which primarily arise from the large-scale errors in the flux fields. Within a DA analysis, specifying these long error correlations will induce the analysis to suppress small-scale features while propagating the information from the observations to long distances, in the horizontal and the vertical. If the flux fields differed only on smaller scales, for example, if the signal were dominated by fossil fuel emissions, shorter correlation lengths would have been derived. Thus, one advantage of the proposed Δ-statistics method lies in inferring error correlations that directly depend on the uncertainty associated with all the underlying drivers of the physical problem being examined.
 If the ΔCO2 fields are a reasonable representation of the structure of the background errors, then the corresponding horizontal and vertical error correlations indicate that the background error statistics should vary spatially and temporally. By specifying these error statistics to be constant over large areas or forcing them to be invariant in time, the NMC statistics may underestimate (or over-estimate) the magnitude of the true errors and therefore degrade the DA analysis.
4.2 Impact of Background Error Statistics on 4D-VAR Analysis
 Figure 5 shows a sample of atmospheric CO2 fields near the surface from the three 4D-VAR experiments (see Figure S2 in supporting information for a sample of atmospheric CO2 fields at ~45 hPa, or ~20 km elevation above the surface). All three experiments capture the latitudinal and interhemispheric gradients in CO2 as well as the seasonal cycle of the Northern Hemisphere biospheric CO2. Evaluation of the two 4D-VAR analyses using the GOSAT observations against the unconstrained run indicates that the analysis using the Δ-statistics allows more pronounced adjustments by the GOSAT observations than the analysis using the NMC statistics. A typical example is visible over boreal Asia, where the CO2 fields in the analysis with Δ-statistics (Figure 5b) are more different from the unconstrained model run (Figure 5f) than the CO2 fields in the analysis with NMC statistics (Figure 5d) from the unconstrained model run (Figure 5f).
 The Δ-statistics and the NMC statistics specify different changes (in terms of both pattern and magnitude) to the background model CO2 concentrations. This can be examined by looking at the monthly-averaged differences between the 4D-VAR analyses based on the two background error statistics and the unconstrained model run (Figure 6) or by looking at the CO2 analysis increments (analysis minus background values), which show the direct impact of the observations on the CO2 field (Figure S3 in supporting information). The CO2 increments in the NMC experiment are generally small scale, while the Δ-statistics allow much broader scale adjustments to the CO2 fields. This supports the original hypothesis that the longer error correlation lengths (both in the horizontal and the vertical) in the Δ-statistics allow the GOSAT observations to have a greater impact (Figures 6a and 6b), and make adjustments over larger scales. The smaller error variances associated with the NMC statistics give less weight to the GOSAT observations in favor of keeping the analysis close to the background CO2 distribution and produce only localized changes (Figures 6c and 6d). The mean absolute difference at 975 hPa (i.e., ~ 0.3 km above the surface) in CO2 concentrations between the analysis with the NMC statistics and the unconstrained model run (January—0.22 ppm, June—0.36 ppm) is smaller, and less variable across seasons, than the corresponding difference between the analysis with the Δ-statistics and the unconstrained model run (January—0.63 ppm; June—3.3 ppm).
 The difference between the two analyses is greater near the surface, where most of the variability in the CO2 processes (such as fossil fuel emissions, biospheric exchanges, influence of boundary layer variations, etc.) is present. This can be primarily attributed to the fact that at lower levels of the atmosphere, (a) the error statistics from the NMC and the Δ-statistics methods are most different and (b) the GOSAT observations tend to have the largest impact on the analysis. The accuracy of these adjustments is next evaluated using independent observations.
4.3 Evaluation of 4D-VAR Analysis Using Independent Observations
 The impact of the background error statistics on the 4D-VAR analysis, in terms of yielding a more realistic atmospheric CO2 product and a better assimilation system, is assessed by comparing the estimated atmospheric CO2 fields to observations from the aircraft and the TCCON networks, respectively. Figure 7 shows example of two profiles collected on January 9th, 2010 and June 26th, 2010, over Worcester, Massachusetts (site code-NHA in Figure 2). In both cases, the 4D-VAR analyses reduce the mismatch between the background and the true atmospheric CO2 state. The ability of the GOSAT observations to adjust the details of the background CO2 profile, however, is limited by the NMC method relative to the more flexible Δ-statistics approach. Figure 8 shows the TCCON observations collected over Bialystok, Poland (site code—BIA in Figure 2) for different days in the months of January and June 2010. The NMC statistics again limit the degree to which the GOSAT observations can adjust the background state.
 Overall, the CO2 estimates based on the Δ-statistics show significant improvement relative to the analysis using NMC statistics during the Northern Hemisphere summer, when significant uncertainty in the fluxes is present (Figure 9). The variability of the MAE across TCCON sites is also smaller when using the Δ-statistics approach. This indicates that the CO2 concentrations obtained from the analysis with the Δ-statistics are consistently closer to the observed CO2 from the TCCON sites. As outlined earlier, the GOSAT data are themselves calibrated to agree with the TCCON XCO2 observations; hence, better agreement of the analysis represents an unambiguous improvement. During the Northern Hemisphere winter, the analysis with the Δ-statistics performs similarly to the analysis using NMC statistics (Figure 9a, 30°–60°N). This is not surprising given the dearth of good-quality GOSAT retrievals, as a consequence of which the impact of observations on the analysis is not significant. Similarly, the performance over the Southern Hemisphere winter (Figure 9b, 30°–60°S) is difficult to judge given the paucity of TCCON sites and wintertime CO2 data.
 The aircraft profiles allow us to also examine the assimilation performance at different levels of the atmosphere. As seen from Figure 7, at higher levels of the atmosphere (~ ≤ 45 hPa or ≥ 20 km), the differences in the analysis estimates become negligible due to the following: (a) the background error covariance from the Δ-statistics and the NMC statistics being similar and (b) the GOSAT observations being less informative in the free troposphere and stratosphere relative to the lower troposphere. Between 10 and 20 km (i.e., ~ 200 hPa and 45 hPa) only occasional differences are visible between the 4D-VAR analyses and the unconstrained model run, but both analyses remain close to the background model CO2.
 These patterns are not limited to the particular aircraft profile examined in Figure 7 but are also true for other aircraft profiles (Figure 10). Given that most of the variability is driven by changes at the surface, information from the observations can only change CO2 concentrations at upper levels of the atmosphere through atmospheric transport or through the vertical error correlations prescribed in the background covariance matrix. As discussed earlier in section 4.1, the Δ-statistics is expected to spread information more strongly in the vertical relative to the NMC statistics. Whereas the MAE for the analyses with the Δ-statistics and the NMC statistics are similarly higher up in the atmosphere (Figures 10a and 10b, 6–8 km), differences in the MAE are more clearly visible lower down (Figures 10a and 10b, 0–2 and 2–4 km). Similar to the TCCON evaluation, the analysis with the Δ-statistics outperforms the analysis using the NMC statistics and the unconstrained run slightly during the Northern Hemisphere summer (Figure 10b, 0–2 km), but does slightly worse (Figure 10a, 0–2 km) during the winter. During the winter months, even the analysis with the NMC statistics does worse than the unconstrained model run (Figure 9a—30°–60°N and Figure 10a—0–2 km), which points to an inconsistency between the constraint provided by the GOSAT data and the available independent observations from the aircraft samples.
 The 4D-VAR configuration assumes unbiased Gaussian error statistics. Hence, any systematic differences between the observations and the model background may be interpreted incorrectly as an uncertainty rather than a bias. When the background error is small relative to the observation error, as is the case with the NMC experiment, the analysis will be directed toward the model background, irrespective of the bias in either the observations or the model background itself. When the background error is large compared to the observation error, as is the case in the Δ-statistics experiment, the analysis is pulled toward the observations in the areas where they are available. Ultimately, this leads to the observed differences in the final CO2 analysis. Previous experiments (results not shown) with a biased set of AIRS data demonstrated that the observational constraint cannot be biased for the Δ-statistics to provide reasonable results. The observed poorer performance of the Δ-statistics during the winter months could therefore also be a by-product of biases in the GOSAT XCO2 retrievals. In this case, erroneous data with loose background error statistics (i.e., ∆-statistics) are potentially more harmful than with a more constrained background error statistics (i.e., NMC statistics). The challenge, as in all DA systems, remains to achieve the best balance (i.e., weighting) between the observation and the background information sources.
 The specification of background error covariance statistics is a critical component of any atmospheric data assimilation system, and accurate representation of these statistics is necessary in order to make efficient use of the observational information. This study examined the specification of the background error statistics for the atmospheric CO2 data assimilation problem. Using the state-of-the-art 4D-VAR system in place at ECMWF, this study shows that it is necessary to modify the forecast or ensemble-based techniques prevalent in NWP applications to account for both the errors in the underlying CO2 fluxes and the errors associated with atmospheric transport of CO2. Limitations associated with existing methods, such as the NMC method, prompted the investigation of a new flexible approach for parameterizing the background error statistics that is more suited to atmospheric CO2 data assimilation.
 Using the difference between CO2 concentrations resulting from alternate sets of CO2 flux and transport models as a proxy for the background errors, spatial statistical tools were used to generate the background error statistics. The resultant error statistics were consistent with the large-scale structures in the background error, and implied errors correlated over longer distances relative to those deduced from the NMC statistics. This allowed the information from the assimilated observations to reduce errors over larger areas. For the test cases explored here, experiments using GOSAT CO2 observations and subsequent evaluation with independent CO2 observations illustrated that taking into account the errors in the background CO2 fluxes is necessary to improve the assimilated atmospheric CO2 concentrations. Because the variability in the CO2 background, especially near the surface, is primarily driven by the underlying variability in the surface fluxes, this study makes the clear case for including a good estimate of surface flux errors and error correlations in any method that is used to estimate the background error statistics within a CO2 data assimilation system.
 The experiments also demonstrated some caveats associated with the proposed ∆-statistics approach. First, a judicious selection of the flux and transport models is recommended to capture realistic uncertainty in the background. As different combination of models may have different representations of the underlying mechanistic processes and/or atmospheric transport, and spatial and longitudinal biases may crop up near the surface due to inaccurate representations of these processes. Second, unlike the statistics from a forecast-based technique such as the NMC method, the generation of the Δ-statistics requires additional analysis and computational time. Future work will investigate possible ways to generate the error statistics from an ensemble of models to reduce any form of biases and also reduce the overall computational time required to generate the background error statistics. The overall benefits from the new approach, however, outweigh these drawbacks, as demonstrated by a superior match to independent data relative to the existing NMC method. This exercise also provided a general insight into the improvements that can be achieved within operational atmospheric data assimilation systems from better representations of background error statistics.
 The authors thank Derek Posselt, Peter Adriaens and three anonymous reviewers for fruitful comments and discussions regarding this work, the many people at ECMWF who helped build the tracer data assimilation system, and partners within the GEMS and MACC projects. This work was supported by the National Aeronautics and Space Administration (NASA) through Earth System Science Fellowship for Abhishek Chatterjee, under grant NNX09AO10H. Additional support was provided through NASA grant NNX12AB90G. The work of Richard Engelen was funded through the MACC project, which is funded by the European Commission under the Seventh Research Framework Programme, contract 218793. The work of Stephan Kawa was supported through the NASA Carbon Cycle Science and Atmospheric CO2 Observations from Space opportunities. TCCON data were obtained from the TCCON Data Archive, operated by the California Institute of Technology from the website at http://tccon.ipac.caltech.edu/. Finally, the GOSAT-ACOS data were produced by the ACOS/OCO-2 project at the Jet Propulsion Laboratory, California Institute of Technology, and obtained from the ACOS/OCO-2 data archive maintained at the NASA Goddard Earth Science Data and Information Services Center.