CO2 surface fluxes that are statistically consistent with surface layer measurements of CO2, when propagated forward in time by atmospheric transport models, underestimate the seasonal cycle amplitude of total column CO2 in the northern temperate latitudes by 1–2 ppm. In this paper we verify the systematic nature of this underestimation at a number of Total Carbon Column Observation Network (TCCON) stations by comparing their measurements with a number of transport models. In particular, at Park Falls, Wisconsin (United States), we estimate this mismatch to be 1.4 ppm and try to attribute portions of this mismatch to different factors affecting the total column. We find that errors due to (1) the averaging kernel and prior profile used in forward models, (2) water vapor in the model atmosphere, (3) incorrect vertical transport by transport models in the free troposphere, (4) incorrect aging of air in transport models in the stratosphere, and (5) air mass dependence in TCCON data can explain up to 1 ppm of this mismatch. The remaining 0.4 ppm mismatch is at the edge of the ≤0.4 ppm accuracy requirement on satellite measurements to improve on our current estimate of surface fluxes. Uncertainties in the biosphere fluxes driving the transport models could explain a part of the remaining 0.4 ppm mismatch, implying that with corrections to the factors behind the accounted-for 1 ppm underestimation, present inverse modeling frameworks could effectively assimilate satellite CO2 measurements.
 An accurate estimate of the spatial distribution of global CO2 surface flux and its annual and interannual variation has been one of the holy grails of carbon cycle research for several decades. Such an estimate would, among other things, improve our ability to close regional and global carbon budgets [Sarmiento and Gruber, 2002] and quantify the effect of climate change on the size of terrestrial and oceanic carbon sources and sinks [Le Quéré, 2007; Law et al., 2008a; Zickfeld et al., 2008]. Atmospheric inverse modeling of CO2, which uses a global atmospheric transport model to arrive at a spatiotemporal distribution of surface fluxes that are statistically consistent with measurements of CO2, is a powerful top down method for estimating the sizes of those sources and sinks [Enting et al., 1995; Kaminski et al., 1999; Rayner et al., 1999; Bousquet et al., 2000; Krakauer et al., 2004; Baker et al., 2006; Rödenbeck et al., 2003]. State of the art CO2 inverse models today use surface measurements of CO2 taken at multiple stations across the globe [Tans et al., 1990; Conway et al., 1994; Bakwin et al., 1998]. The amount of surface data, however, is not enough to strongly constrain the spatiotemporal variability of surface fluxes. For example, the uncertainty in the mean annual CO2 flux from 1992 to 1996 over South America is 0.64 Gt C/yr, compared to a mean annual flux of −0.24 Gt C/yr over that region [Gurney, 2004]. Utilizing satellite measurements of total column CO2 from instruments such as GOSAT [Hamazaki et al., 2004] can theoretically better constrain the flux estimate and increase its accuracy [Chevallier et al., 2007].
 To be useful for atmospheric inversions, satellite measurements need to be accurate enough to provide a tighter constraint on surface fluxes than that already provided by surface measurements. Specifically, to reduce the uncertainties in annual surface fluxes aggregated over 1000 × 1000 km2 grid cells to 40% of their present values, systematic errors on total column measurements must be less than 0.4 ppm [Houweling et al., 2010; Ingmann et al., 2008; Chevallier et al., 2010]. However, such observational accuracy will be wasted unless the transport models used for atmospheric inversions can simulate the total column CO2 mixing ratio (hereafter referred to as equally accurately when forced with “true” surface fluxes. To check whether atmospheric transport models satisfy that requirement, we need to check the simulated by transport models against observed
 This check cannot yet be performed against satellite measurements of since they themselves have yet to meet the accuracy requirements mentioned above. To calibrate satellite measurements, a ground-based global network (the Total Carbon Column Observation Network, or TCCON) of sun-facing Fourier transform spectrometers has been built to provide continuous measurements of and other trace gases [Toon et al., 2009; Wunch et al., 2011]. We use TCCON measurements of to validate simulated by transport models. In particular, we look at the seasonal cycle of as observed by TCCON and as simulated by models.
 The seasonal cycle amplitude of represents the most significant temporal variation of CO2 surface flux over annual time scales. Moreover, due to the relatively fast zonal transport times compared to latitudinal transport times, the seasonal cycle contains information about the seasonality of surface fluxes in the entire latitudinal band where the site is located. Therefore, small variations in the seasonal amplitude of can, when inverted, translate to large variations in surface fluxes in the entire latitudinal band. For example, Yang et al.  demonstrated that given an atmospheric transport model, a Net Ecosystem Exchange (NEE) in the northern temperate latitudes that is statistically consistent with surface CO2 measurements needs to be increased by ∼28% to be consistent with measurements that depend on CO2 higher up [Yang et al., 2007], specifically, measurements at Park Falls and Kitt Peak, and short-term aircraft measurements at a few other sites. They calculated the NEE adjustment on the assumption that the entire mismatch in observations stemmed from incorrect biosphere fluxes fed to the transport models. However, specifying “true” biosphere fluxes to transport models will not necessarily produce consistent with observations due to shortcomings of the models. In this paper, we assess the impact of the transport models themselves on the model-observation mismatch of seasonal cycle amplitude.
 We first confirm that at multiple TCCON sites in the northern temperate latitudes, transport models systematically simulate a seasonal cycle amplitude of that is lower by ∼2 ppm compared to observations. In doing this we demonstrate that the estimate of Yang et al. , which was based on one TCCON site (Park Falls), one other site (Kitt Peak), and a few short-term aircraft measurements, holds globally for multiple years in the northern temperate latitudes. This mismatch could come from either observational (TCCON) errors or model errors. Model errors come from errors in specifying the surface flux, errors in transport, and errors due to the large spatial resolution of the model, the so-called representation errors. Representation errors alone, which can be estimated from modeled tracer gradients, are typically larger than reported errors in TCCON data [Wunch et al., 2010]. Hence model errors are expected to be much larger than TCCON measurement errors, and we consider them to be the major factors behind the aforementioned mismatch. Our goal therefore is to estimate parts of the mismatch coming from different model errors. Specifically, we examine how transport models perform at different altitudes, and whether the majority of the mismatch comes from a few problematic pressure levels. Such a diagnosis would provide pointers on how to fix the modeled transport at those levels without introducing errors at other levels.
 Modeled CO2 concentrations in this paper come from CarbonTracker release CT2010 and four transport models from the TRANSCOM 4 model intercomparison experiment targeting the use of satellite measurements [Gurney, 2004; Patra et al., 2008; Saito et al., 2011]. The aim of the TRANSCOM experiment was to characterize the expected variability of total column CO2 over a range of spatiotemporal scales, and to investigate how well these variations are reproduced by atmospheric transport models, given their uncertainties. The experimental setup was the same as used for the TRANSCOM continuous experiment [Law et al., 2008b], except that three-dimensional concentration fields were stored at the time-step of the models. Furthermore, additional tracers were introduced replacing the CASA-derived terrestrial biosphere flux with hourly fluxes from CarbonTracker [Peters et al., 2007], which were fed to the models either hourly or as daily and monthly averages. In this paper we use output for the tracer corresponding to hourly CarbonTracker fluxes.
 The structure of this manuscript is as follows. In section 3 we demonstrate that transport models systematically underestimate the seasonal cycle amplitude of at multiple TCCON sites. In section 4 we estimate the contributions to this mismatch from factors that affect the entire vertical column, namely the averaging kernel of TCCON instruments, air mass–dependent corrections to TCCON observations and atmospheric water column corrections to modeled In section 5 we break the remaining mismatch down into mismatches originating at different vertical levels. We use NOAA CMDL CO2 measurements in the surface layer, aircraft CO2 measurements over some locations in the lower troposphere, CONTRAIL CO2 measurements in the free troposphere, and age-of-air experiments in the stratosphere to examine model-observation mismatches at each of these layers. We summarize these comparisons in a “mismatch budget” in section 6. This budget falls short of the model-TCCON mismatch seen in section 4, and in section 7 we hypothesize on possible ways to close this budget.
2. Procedure for Data Analysis
2.1. Comparing Models to TCCON and Surface Data
 To construct seasonal cycles, a multiyear time series is detrended and collapsed onto a single year. Throughout this paper, CO2 concentrations were detrended by 2.01 ppm/yr. This trend was deduced from CO2 measurements at Mauna Loa, Hawaii, which, being in the middle of the ocean and far away from anthropogenic and industrial influences, is representative of the global growth rate. Although the long-term trend since the start of the Mauna Loa CO2 record in 1974 is not linear, 2.01 ppm/yr is a very good approximation to the trend seen over the last ten years, which includes the time periods for our observations and model runs.
 To compare with observed detrended model data, available as three dimensional concentration fields, were vertically averaged by weighing the mixing ratio in each vertical layer by the air mass in the layer:
being the CO2 mixing ratio in layer i above a certain surface location. Layer i is between pressure levels pi and pi+1. The layer index increases from the ground up, so the surface pressure is p0 = psurface. For comparing with surface measurements at locations with sampling towers, surface data at the highest tower levels were chosen to eliminate localized fluctuations. For example, at Park Falls we selected the tower data from 868 meters above sea level (396 meters above the surface).
 Ideally, the vertical averaging (equation (1)) of modeled CO2 over a TCCON site should include the averaging kernel and prior profile used in the TCCON retrieval. The averaging kernel matrix A describes the sensitivity of the retrieved vertical profile xret to the true vertical profile xtrue and the a priori vertical profile xprior [Rodgers, 2003; Deeter et al., 2003]
where the subscripts refer to different vertical layers. For a total column measurement, the matrix A is summed over different layers to yield a total column averaging kernel [Deeter et al., 2007]. In section 4.2 we assess the impact of including the averaging kernel on the modeled seasonal cycle amplitude, as opposed to using a unit averaging kernel.
 Most of the TCCON stations have come online only in the past couple of years, while the model data used in this paper span 2002–2003 (TRANSCOM 4 models) and 2001–2009 (CarbonTracker). Therefore exact cosampling of models and observations was not always possible. For Figure 2, we constructed 24 hr averages of the model data, whereas the TCCON daily averages, being daytime measurements, reflect daytime averages. This could give rise to some bias in the seasonal cycle, which we discuss in section 3. For a more detailed analysis later at the Park Falls, Wisconsin TCCON site, we sampled CarbonTracker exactly at the TCCON measurement times and constructed averages of all the measurements on a single day, thus precluding the aforementioned sampling bias. Such a cosampling was used to construct, for example, Figure 3, and was used in our analysis from section 4 onward.
 Since the seasonal cycle amplitude does not have a known significant trend, we expect the average seasonal cycle amplitude over multiple years at a location to be a good representation of the seasonal cycle at that location. Both modeled and observed data were binned into 366 bins, with each bin corresponding to a day of year. The data inside each bin were averaged to yield a mean mixing ratio for that day of year as well as its error. To assess the seasonal cycle amplitude, the daily averaged data were smoothed using Loess smoothing [Cleveland et al., 1990]. The peak-to-peak values of the smoothed curves, i.e., the differences between the spring peaks and the summer troughs in the Northern Hemisphere, were used as estimates of the respective seasonal cycle amplitudes.
2.2. Comparing Models to Aircraft Data
 In section 5.2 we compare CarbonTracker CO2 concentrations with aircraft samples. All the aircraft data considered were gathered between 2001 and 2008, the period for which we also possess CarbonTracker CO2 fields. This enabled us to sample CarbonTracker CO2 concentrations at the same spatiotemporal locations as the aircraft samples. CarbonTracker uses pressure as a vertical coordinate, whereas certain aircraft campaigns used altitude. For aircraft data reporting altitude but not pressure, radiosonde measurements over Lamont, Oklahoma, were used to translate between altitude and pressure levels. After cosampling, both modeled and observed data were binned into days of the year as before, and averaged to yield a single value per location within some pressure interval for each day of the year. The resulting averages were Loess smoothed to estimate the peak-to-peak seasonal cycle amplitudes.
 In section 5.2 we estimate the contribution of different vertical layers to the seasonal cycle amplitude. To do this, we note that the extrema in do not occur at the same time as the extrema in the individual layers (Figure 1). The amplitude in the seasonal cycle of hereafter referred to as X, can be written as
where ts and tf denote the times for the spring peak and fall trough in as read off from the modeled total column. Each term inside the summation can be thought of as the contribution from one layer to X. Specifically, the contribution of a layer between pressures p1 and p2 < p1 to X is
So the cumulative contribution up to pressure level pi plotted in Figures 1 (middle) and 1 (right) is
We have plotted this measure of cumulative contribution as derived from CONTRAIL measurements in Figure 1 (middle). This measure can be used to split the total column seasonal cycle amplitude into contributions coming from different layers, or to extrapolate partial column amplitudes into total column amplitudes. For example, suppose that the seasonal cycle amplitude at some location in the northern temperate latitude band is 1.56 ppm. According to Figure 1 (middle), 32% of this amplitude comes from layers between 730 hPa and 470 hPa (shaded interval). Therefore, the contribution of this partial column to the seasonal cycle amplitude is 32% of 1.56 ppm, or 0.5 ppm. Alternatively, if we only had aircraft measurements between 730 hPa and 430 hPa, and if those measurements gave us a partial column seasonal cycle amplitude of 0.5 ppm, then we could estimate the seasonal cycle amplitude to be 1.56 ppm, under the assumption that the partial column aircraft measurements have the same characteristics as CONTRAIL measurements.
 Moreover, if we follow the procedure described above with CarbonTracker CO2 fields to estimate the per-layer contribution to the modeled seasonal cycle, we see the same sort of graphs for cumulative contribution. Figure 1 (right) shows these graphs over multiple locations where we also have observational aircraft data. Briggsdale, Colorado, is a deviant site since it is a high-altitude site, with a lower surface pressure. If both observations and models follow the contribution profile in Figure 1, then so should their difference. For example, over Park Falls, layers below 400 hPa contribute 80% of both the modeled and observed seasonal cycle amplitude. Therefore a mismatch between the two over that partial column should also be 80% of the model-observation mismatch in the seasonal cycle amplitude.
 We have lower tropospheric aircraft CO2 measurements over multiple locations. In section 5.2.3 we use the procedure detailed above to extrapolate the partial column mismatches derived from those measurements to estimate mismatches in the seasonal cycle amplitude. For example, over Estevan Point, British Columbia, we have aircraft measurements of the CO2 vertical profile between 981 hPa and 526 hPa. The seasonal cycle amplitude across this partial column calculated from aircraft data is higher than the modeled partial column seasonal cycle amplitude by 0.27 ppm. According to Figure 1 (right), this pressure interval contributes 60% of the total column seasonal cycle over Estevan Point. Therefore, the expected mismatch in the total column seasonal cycle is 0.45 ppm.
2.3. Estimating the Contribution of Surface Flux Uncertainty
 In section 7.1 we estimate the impact surface flux uncertainty on the seasonal cycle amplitude. For this purpose, optimized surface CO2 fluxes from six different inversion frameworks were propagated forward using the TM5 transport model [Krol et al., 2005] on a 6° × 4° global grid with 25 σ-pressure hybrid vertical levels from 1 January 2003 to 31 December 2006. For each biosphere flux specification was calculated above the TCCON sites and detrended, collapsed onto one year and smoothed according to the procedure of section 2.1. The peak-to-peak values of the resulting curves, as in section 2.1, were used as estimates of the seasonal cycle amplitudes.
3. Results: TCCON Measurements Versus Models
 Following the procedure described in section 2.1, we compared total column CO2 mixing ratio measurements from TCCON sites (D. Wunch et al., The Total Carbon Column Observing Network, TCCON Data Archive, 2010, http://tccon.ipac.caltech.edu) [Toon et al., 2009] with total columns simulated by four models from the TRANSCOM 4 experiment (ACTM, LMDZ4, NIES05 and TM5) and CarbonTracker. The modeled was calculated following equation (1). Figure 2 shows the spread of the four TRANSCOM models (green patch), TCCON daily averages (red circles) and CarbonTracker daily averages (blue squares).
 As can be seen in Figure 2, while all the models can reproduce the phasing of the seasonal cycles of at multiple TCCON sites, they consistently underestimate their amplitudes by 1–3 ppm. This underestimation is systematic and is seen at several TCCON sites, such as Park Falls (Figure 2, left), Lamont (Figure 2, middle) and Tsukuba (Figure 2, right).
 Among all the TCCON sites in the northern temperate latitudes, Park Falls has the longest period of data acquisition, relatively few data gaps and a large seasonal cycle amplitude, resulting in good statistics during analysis. It also has surface layer data from the NOAA flask sampling network and aircraft overflight data at different altitudes over several years. Therefore, henceforth we will focus on Park Falls as a typical northern temperate TCCON station, to construct an “error budget” for the mismatch between TCCON-observed and modeled seasonal cycle amplitudes.
 For the four TRANSCOM 4 models we have model data for two years (2002 and 2003) whereas for CarbonTracker we have model data for nine years (2001 to 2009), which results in better statistics. According to Figure 2 and similar analyses over other TCCON sites, the seasonal cycle amplitude of modeled by CarbonTracker is neither consistently higher nor consistently lower than other models, making its output fields good proxies for typical “model data.” Therefore, throughout the rest of this paper we will use the term “model data” to refer to CarbonTracker data, unless otherwise specified. Later in section 8 we will discuss how our conclusions are affected by variations between different transport models.
 Using CarbonTracker as a typical model has the added advantage that unlike the TRANSCOM 4 models, CarbonTracker data has almost complete temporal overlap with most TCCON stations, including Park Falls. Hence, the model can be cosampled at the exact times of the TCCON samples before it is detrended and averaged following the prescription of section 2.1. Such temporal cosampling nullifies any bias between model and observations that might arise from TCCON samplings being strictly daytime. The resulting seasonal cycle of CarbonTracker is plotted against the TCCON seasonal cycle at Park Falls in Figure 3. The modeled seasonal cycle amplitude becomes 7.8 ppm, still 1.4 ppm lower than that measured by TCCON. In the rest of this paper, we try to account quantitatively for the different factors behind this mismatch.
4. Factors Affecting the Whole Column
 As a first step toward understanding the mismatches seen in Figures 2 and 3, we consider three factors that could affect in models and measurements.
4.1. Water Vapor
 calculated by CarbonTracker is a “wet air” mole fraction; that is, in the model, psurface of equation (1) is the barometric pressure. By contrast, reported by TCCON is the “dry air” mole fraction; that is, psurface excludes the surface pressure due to the total water column in the atmosphere. Converting the modeled “wet air” to a “dry air” results in a small correction:
 The correction factor multiplying in equation (6) is always >1, and has a seasonal variation owing to the seasonal variation of atmospheric water content. This factor was calculated over Park Falls from ECMWF ERA-Interim water columns and surface pressures for the entire TCCON operation period. As can be seen in Figure 4, this factor has a seasonal variation, being higher in late summer (when there is more water in the atmosphere) than early spring. So the correction of equation (6) raises the fall trough of more than it raises the spring maximum, decreasing the modeled seasonal cycle amplitude. This decrement over Park Falls, as seen in Figure 5, is 0.4 ppm, which is therefore the amount by which the model-observation mismatch is increased.
4.2. Averaging Kernel and Prior
 To investigate whether a realistic averaging kernel significantly alters the seasonal cycle, we “retrieved” modeled CO2 profiles over Park Falls, Wisconsin (United States) by substituting xtrue with xmodel in equation (2). For each TCCON observation at Park Falls between 1 January 2001 and 31 December 2009, the three-dimensional CarbonTracker CO2 field over Park Falls was sampled at the observation time, and then vertically integrated using the averaging kernel and prior CO2 profile for that observation. Daily averages were created from both modeled and observed Any possible mismatch between the two owing to the fact that observations only sample daytime columns was thus removed. As a control experiment, the same procedure was followed with a unit averaging kernel for the model data. The two are presented for comparison in Figure 6.
 Retrieving the modeled seasonal cycle with a realistic averaging kernel at Park Falls increases the modeled seasonal cycle amplitude by 0.1 ppm, decreasing the model-observation mismatch by the same amount. We verified that the correction due to the averaging kernel and prior at Lamont was 0.1 ppm as well (not shown). Since the averaging kernels and prior profiles for TCCON CO2 are similar across all the sites [Wunch et al., 2010], we expect the effect of including the averaging kernel to be similar at sites in the northern temperate latitudes.
4.3. Air Mass Dependence of TCCON Data
 The total column CO2 retrieved by a TCCON instrument is corrected for any air mass dependency in published TCCON data sets [Deutscher et al., 2010]. This correction is a function of the solar zenith angle (SZA), which has a seasonal variation, causing any uncorrected air mass dependence to influence the observed seasonal cycle amplitude. The air mass–dependent correction is quite small; retrieved from the same vertical profile is higher by ∼1% at 20° SZA compared to 80° SZA [Wunch et al., 2011]. Therefore any uncorrected air mass–dependence aliasing into the seasonal cycle is expected to be even smaller [Wunch et al., 2010, 2011]. To quantify the influence of remaining air mass–dependent corrections on the seasonal cycle amplitude, we checked the impact of excluding TCCON data with SZA > 70° where air mass–dependent corrections are most significant (P. Wennberg, personal communication, 2010). As can be seen in Figure 7, if we exclude data with SZA > 70°, the impact on the seasonal cycle amplitude is minimal. For future use we note that at Park Falls this filtering decreases the observed amplitude by 0.1 ppm, decreasing the model-observation mismatch by the same amount.
5. Vertical Profile of the Mismatch
 In section 4 we looked at three factors that affect the total atmospheric column, and saw that they are not sufficient to explain the observed mismatch seen in Figure 3. This raises the question whether there are certain layers in the atmosphere that generate the majority of the mismatch. Aircraft measurements of CO2 at different heights and different seasons can be used to construct a seasonal cycle above the surface, which can then be compared with the modeled seasonal cycle. However, aircraft measurements are not total column measurements, and to assess the impact of mismatches between aircraft CO2 samples and CarbonTracker CO2 fields, we must evaluate how such mismatches affect the total column seasonal cycle amplitude. CarbonTracker CO2 fields, it should be noted, are optimized against surface layer CO2 measurements but not total column or aircraft CO2 measurements.
 In section 2.2 we detailed how to calculate the contribution of different vertical layers to the seasonal cycle in and also how to estimate the model-measurement mismatch in the seasonal cycle amplitude from measurements in a partial column. We will now look at how modeled CO2 differs from measured CO2 at different pressure levels. Using graphs from Figures 1 (middle) and 1 (right), we will estimate the impact of these individual mismatches on the total column mismatch.
5.1. Surface Layer: NOAA CMDL
 We compared in situ and flask measurements of CO2 from the NOAA CMDL network [Conway et al., 1994; Bakwin et al., 1998] with CarbonTracker [Peters et al., 2007] output CO2. Both modeled and observed data were detrended and averaged following procedures described in section 2. Model and observation data were cosampled, and filtered to exclude all sampling times before 12:00 and after 18:00 LT to exclude spuriously high mixing ratios due to entrapment of CO2 by a shallow planetary boundary layer.
 In Figure 8, we compare CarbonTracker posterior CO2 fields with observations at three different NOAA CMDL stations. Two of them (Mauna Loa and Barrow) are background stations, i.e., they are not near anthropogenic sources and their localities do not have large biospheric flux variations. Thus the seasonal cycles at those stations represent spatial averages over large areas surrounding the stations, something that CarbonTracker captures very well. At Park Falls, the agreement is not as good, partly due to biases in the biosphere fluxes and partly due to insufficient constraints on the inversion used to derive those fluxes [Peters, 2010]. It should be noted that Park Falls has one of the largest surface seasonal cycles among all the surface stations, and therefore matching it exactly requires quite accurate specification of constraints and their uncertainties.
 From Figure 8 we conclude that modeled CO2 at the surface layer in the northern temperate latitudes faithfully reproduces large-scale spatiotemporal features seen in observations. There are measurements, surface measurements as well as aircraft measurements at Park Falls. Therefore, we will use Park Falls as a validation station for modeled CO2, i.e., we will compare modeled CO2 at various vertical layers over Park Falls with observed CO2 to draw conclusions about the performance of transport models at various heights, and the contributions from different heights to the mismatch of Figure 3, which will be summarized in section 6.
5.2. Free Troposphere: Aircraft Measurements
 In the free troposphere up to the tropopause, we can use aircraft measurements of CO2 to evaluate model performance. The main difficulty with a comparison between a transport model and aircraft data is sparsity of aircraft data, i.e., aircraft data at many different altitudes over the same location over an entire year or longer, which is necessary to study the seasonal cycle, are rarely available. The Earth System Research Laboratory (ESRL) of the National Oceanic and Atmospheric Administration (NOAA) has conducted overflights over several sites, and some sites have enough data to construct seasonal cycles in the partial columns sampled. We use their data to examine the mismatch between model and observations over the Park Falls TCCON site up to 634 hPa, which is the upper limit for their observations. Above 634 hPa, we utilize data from the Comprehensive Observation Network for TRace gases by AIrLiner (CONTRAIL) project, which samples CO2 concentrations up to 150 hPa. For all the comparisons, CarbonTracker CO2 fields were sampled at the same spatiotemporal locations as the observations. The data were then detrended and averaged according to the methods of section 2.
5.2.1. Park Falls, Wisconsin
 The NOAA Global Monitoring Division (GMD) aircraft data over Park Falls, Wisconsin, provide us with vertical profiles (from 937 hPa to 634 hPa) from 24 August 2002 onward. Comparing it with cosampled CarbonTracker CO2 in Figure 9 and Table 1, we see that the modeled contribution to the total column seasonal cycle amplitude from that layer is 0.47 ppm lower than the overflight data suggest. Since this partial column is only 32% of the total atmospheric column, we estimate the contribution of the rest of the column, the tropospheric part, based on CONTRAIL data described in section 5.2.2, which spans the free troposphere up to the tropopause.
Table 1. Contribution of Different Pressure Bands to the Total Seasonal Cycle Amplitude Over Park Falls, According to GMD Overflights and CarbonTrackera
Pressure Band (hPa)
The total column seasonal cycle peaks on day 107 and troughs on day 232. The bold entries refer to the total column sampled and the resulting mismatch in the seasonal cycle amplitude, obtained by summing up the partial columns.
5.2.2. CONTRAIL at Northern Temperate Latitudes
 The CONTRAIL project [Machida et al., 2008] has been observing vertical CO2 profiles over 43 airports worldwide and along intercontinental flight paths using five Japan Airlines (JAL) commercial airliners. The data coverage is extensive in the northern temperate latitudes (30°N to 60°N, which includes Park Falls), and vertically the samples go up to 150 hPa. In Figure 10 we compare zonal averages of CONTRAIL and CarbonTracker data within the northern temperate latitudes between 5 November 2005 and 28 December 2007 at three different pressure bands from the surface to 150 hPa.
 Visually, CONTRAIL and CarbonTracker agree quite well. We can estimate the effect of the mismatches seen in Figure 10 on the total column amplitude as follows. From CONTRAIL data, we see that the spring peak and fall trough in occur approximately on day 95 and day 258 of the year. The difference between the CO2 mixing ratios on those days for each pressure band, i.e., the contribution of the pressure band to the seasonal cycle, is given in Table 2. The contributions are weighed by the layer thicknesses, so a direct addition of the numbers in each row give the total seasonal cycle amplitude across all the layers. From Figure 1, we know that the contribution of layers above 150 hPa to the seasonal cycle amplitude is at most 10%, and Table 2 shows that the mismatch in the seasonal cycle amplitude from all the other layers is 0.35 ppm in the northern temperate latitudes, which extrapolates to 0.39 ppm for the entire vertical column. Further, we note that according to CONTRAIL, the mismatch between 634 hPa and 150 hPa is 0.26 ppm, a number we will later use to compose the total column mismatch over Park Falls.
Table 2. Contribution of Different Pressure Bands to the Total Seasonal Cycle Amplitude, According to CONTRAIL and CarbonTracker, With a Nominal Surface Pressure of 965 hPa Between 30°N and 60°N, or the Northern Temperate Latitudesa
Pressure Band (hPa)
The bold entries refer to the total column sampled and the resulting mismatch in the seasonal cycle amplitude, obtained by summing up the partial columns.
 It is worth noting here that the 0.11 ppm model-observation mismatch between  hPa and  hPa according to CONTRAIL (Table 2) is much less than the [0.47] ppm mismatch between  hPa and  hPa according to GMD data over Park Falls (Table 1). At least part of this discrepancy comes from the difference in sampling locations; CONTRAIL sampled the troposphere mostly over Asia and Europe, whereas Park Falls is in the continental United States. We should, however, be wary of assigning too much importance to this discrepancy. The data density of CONTRAIL is much higher than that of GMD, resulting in less scatter, which can be seen by comparing Figure 10 with Figure 9. CarbonTracker CO2 fields agree quite well with CONTRAIL data when spatiotemporally cosampled (Figure 10). On the other hand, GMD data near the surface (Figure 9, left) deviate quite a lot from CarbonTracker. This is possibly the surface mismatch of Figure 8 (right), erroneously extended upward due to our pressure binning of Table 1; i.e., the mismatch close to the surface is incorrectly assigned to the entire layer between 937 hPa and 845 hPa. We suspect this is the case since the mismatch switches sign in the next layer (845 hPa to 735 hPa), indicating that the mismatch at ∼870 hPa was probably close to zero, whereas our analysis assumed a large positive mismatch there. Ideally this problem could be solved by a finer vertical resolution, but in practice the low aircraft data density rendered that approach impractical. The total mismatch of 0.47 ppm was also sensitive to the precise pressure boundaries chosen for Figure 9 and Table 1, whereas that was not the case for Figure 10 and Table 2. Therefore, the 0.47 ppm mismatch of Table 1 should be considered uncertain and at best an upper limit, and the true mismatch lies closer to the CONTRAIL-predicted 0.11 ppm.
5.2.3. Other Stations in the Northern Hemisphere
 NOAA ESRL aircraft data cover several other surface stations in the Northern Hemisphere. Here we consider three locations with high data density and compare them with CarbonTracker. Their locations, pressure intervals sampled, mismatches in the partial columns and extrapolated mismatches, obtained by the procedure of section 2.2, in the total column are summarized in Table 3.
Table 3. Mismatches Between Aircraft Data and CarbonTracker Data at Three Different Locations in the Northern Hemisphere, and the Extrapolated Mismatches in the Seasonal Cycle Amplitudea
Pressure Interval Sampled (hPa)
Partial Column Mismatch (ppm)
Estimated Total Column Mismatch (ppm)
Park Falls and Lamont, two sites with both TCCON and aircraft data, are also presented for comparison.
The summer uptake at Lamont is severely affected by drought, resulting in an unusual seasonal cycle near the surface (up to 800 hPa) that is overestimated by CarbonTracker. Therefore, it does not make sense to extrapolate the partial column from the lower half of the atmosphere to the total column at Lamont.
 The estimated total column mismatches in Table 3 are station dependent, but consistently fall below the ∼2 ppm mismatches seen in Figure 2 or the 1.4 ppm mismatch seen in Figure 3. The mismatch seen in Figure 3 is therefore consistent with neither the mismatch between CarbonTracker and CONTRAIL data (which covers up to 150 hPa globally) nor the mismatch between CarbonTracker and NOAA ESRL aircraft data over specific sites. According to both those comparisons, CarbonTracker should be better at estimating the seasonal cycle amplitude than suggested by Figures 2 and 3.
5.3. Age of Air in the Free Troposphere and Stratosphere
 The relatively slow mixing across the tropopause and in the stratosphere, compared to the troposphere, gives rise to a delayed response in the stratosphere to events in the lower troposphere such as biospheric exchange of CO2. This time delay in the stratosphere (compared to the lower troposphere) is referred to as the “age” of stratospheric air [Andrews et al., 2001a; Engel et al., 2008]. Thus the seasonal cycle of CO2 in the stratosphere is not only lower in amplitude compared to the surface, but also phase shifted (Figure 11, left). The phase shift between different layers represents the aging of air across successive layers [Andrews et al., 2001b]. If the vertical transport of CO2 is too slow or too fast in the model, this aging process could deviate from reality, resulting in phase shifts across layers that are too large or too small [Jones et al., 2001]. To test whether this is indeed the problem, we consider the hypothetical scenario that the seasonal cycles at all layers are in sync, in which case the amplitude of the total column seasonal cycle should be the maximum possible out of all the different phasing/aging scenarios. To achieve this, we artificially shift each model layer in time until the difference in CO2 concentration between days 120 and 250 (which correspond to the spring maximum and fall minimum at Park Falls) is maximized for that layer. The resulting seasonal cycles can be seen in Figure 11 (right), and the resulting seasonal cycle of can be seen in Figure 12.
 Any aging/phasing in nature is unlikely to produce a higher amplitude than the one artificially achieved in Figure 12. We can therefore consider the 0.5 ppm correction to the modeled seasonal cycle amplitude to be the upper limit of a correction to the model because of incorrect aging of air. In reality, the seasonal cycle does have different phases in different layers, and the model is unlikely to get all the phases wrong. Moreover, in Figure 11 we time-shifted all layers and not just stratospheric layers, where the model was suspect. Hence the real error made by incorrectly modeled age of air is likely to be much less than 0.5 ppm.
6. Synthesis of Mismatches
 To summarize, we construct an “error budget” of the mismatch between TCCON and CarbonTracker seasonal cycle amplitude over Park Falls (Table 4).
Table 4. The Mismatch Between Observations and CarbonTracker of the Seasonal Cycle Amplitude Over Park Fallsa
Different sources of mismatch are investigated at different layers.
 We see that the mismatch budget of Table 4, when compared with the mismatch of 1.4 ppm of Figure 3, leaves ∼0.4 ppm unexplained.
7. Sources of the Remaining Mismatch
 The error in the mismatch budget of Table 4 is a combination of the deficiencies in modeling and in observations. Below we briefly discuss possible contributions from each of these two factors.
7.1. Surface Flux Uncertainty
 In section 3 all the transport models were fed with optimized CarbonTracker biosphere fluxes, which, while statistically consistent with point measurements when propagated by the TM5 transport model, need not be exact. For example, in Figure 8 we see that the summer uptake of CO2 near Park Falls is not completely captured by CarbonTracker near the surface. This mismatch comes partly from insufficient constraints on biosphere fluxes near Park Falls and partly from summer uptake of CO2 in areas upwind of Park Falls, such as Siberia, that are not captured by CarbonTracker optimized fluxes [Peters, 2010]. To estimate the effect of biosphere flux variability on our analysis, we ran a single transport model, TM5, for four years (1 January 2003 to 31 December 2006) with optimized monthly biosphere fluxes from six different inverse modeling frameworks [Peylin et al., 2009]. These optimizations were performed within the framework of the TRANSCOM model intercomparison experiment by various research groups (https://transcom.lsce.ipsl.fr). The resulting total column seasonal cycles at three different TCCON sites are plotted in Figure 13. Figure 13 is analogous to Figure 2, except that the different curves now correspond to different biosphere flux specifications run forward with the same transport model.
 Since the fluxes of Figure 13 were propagated by TM5, the seasonal cycle amplitudes suffer in principle from the transport model shortcomings behind the 1 ppm mismatch of Table 4. If we were to add this “correction” of 1 ppm to the modeled seasonal cycle amplitudes of Figure 13, the corrected model amplitudes at Park Falls would be 7.5–9.3 ppm, which would include the observed amplitude of 9.1 ppm. It is therefore possible that the unaccounted 0.4 ppm mismatch of section 6 arises entirely from insufficiently constrained CarbonTracker surface fluxes. Whether the constrains can be improved by including satellite data is a question that merits further investigation.
7.2. Errors in Observational Data
 TCCON measurements are calibrated against on-site aircraft data at multiple locations [Wunch et al., 2010]. In so far as the aircraft measurements adhere to the World Meteorological Organization (WMO) standard, the TCCON total columns at the time of aircraft overflights can be assumed to be accurate within the WMO standard. However, the aircraft calibration campaigns do not provide long time series, so it is possible that seasonal biases remain in calibrated TCCON data. At Park Falls, this bias was estimated to be less than 0.3 ppm, so at worst only a part of the 0.4 ppm mismatch could originate from an uncorrected seasonal bias in the TCCON data (D. Wunch, personal communication, 2010).
8. Concluding Remarks
 Current inverse modeling frameworks for atmospheric CO2 optimize surface fluxes to be statistically consistent with surface point measurements. In this paper we have demonstrated that such optimized fluxes, when propagated by different transport models, consistently yield a lower-than-measured seasonal cycle amplitude of at multiple sites in the northern temperate latitudes. Focusing on one particular site, Park Falls, we have shown that up to 1 ppm of the 1.4 ppm mismatch there can be accounted for by various factors affecting both modeled CO2 fields, such as transport problems in the free troposphere and stratosphere, and TCCON observations, such as air mass dependence.
 Recent focus on adding satellite measurements of to inversion frameworks makes the remaining ≥0.4 ppm mismatch significant, since it is at the edge of the ≲0.4 ppm accuracy requirement on satellite measurements to improve on our current estimate of surface fluxes [Houweling et al., 2010; Ingmann et al., 2008; Chevallier et al., 2010]. In the best case, we can “fix” up to 1 ppm of the mismatch of Figure 3, implying that satellite and TCCON measurements can be used in inversion frameworks to significantly improve on our state of knowledge of surface fluxes of CO2. In the more realistic case, remembering that the numbers from Table 1 and section 5.3 are upper bounds, the unaccounted mismatch will be more than 0.4 ppm, meaning that we will need to overcome both model and observational shortcomings to assimilate total column (TCCON and satellite) CO2 observations in inverse models.
 The possibility cannot be ruled out that most of the remaining ≥0.4 ppm mismatch comes from uncertainties in the surface flux used to drive the transport models of Figures 2 and 3. To correct for the accounted-for mismatches from Table 4, we need to (1) improve transport models to remove mismatches coming from different pressure levels, (2) put stricter constraints on surface fluxes by adding more measurements, including total column measurements, and (3) quantify any remaining observational errors. As modelers, we plan to further investigate the first two questions in future research projects.
 We are thankful to Debra Wunch for insights into the TCCON measurements, to Rachel Law for several helpful comments on the manuscript, to Arlyn Andrews for aircraft data over Park Falls, to the NOAA ESRL Cooperative Air Sampling Program for flask and in situ CO2 data, and to the TRANSCOM modelers for providing CO2 fields and optimized fluxes from the TRANSCOM 4 model intercomparison experiment. CarbonTracker 2010 results were provided by NOAA ESRL, Boulder, Colorado, from the website at http://carbontracker.noaa.gov. CCON data were obtained from the TCCON Data Archive, operated by the California Institute of Technology from their Web site at http://tccon.ipac.caltech.edu/. This work was supported by NWO grant GO-AO/13.