Notice: Wiley Online Library will be unavailable on Saturday 30th July 2016 from 08:00-11:00 BST / 03:00-06:00 EST / 15:00-18:00 SGT for essential maintenance. Apologies for the inconvenience.
 A forward atmospheric transport modeling experiment has been coordinated by the TransCom group to investigate synoptic and diurnal variations in CO2. Model simulations were run for biospheric, fossil, and air-sea exchange of CO2 and for SF6 and radon for 2000–2003. Twenty-five models or model variants participated in the comparison. Hourly concentration time series were submitted for 280 sites along with vertical profiles, fluxes, and meteorological variables at 100 sites. The submitted results have been analyzed for diurnal variations and are compared with observed CO2 in 2002. Mean summer diurnal cycles vary widely in amplitude across models. The choice of sampling location and model level account for part of the spread suggesting that representation errors in these types of models are potentially large. Despite the model spread, most models simulate the relative variation in diurnal amplitude between sites reasonably well. The modeled diurnal amplitude only shows a weak relationship with vertical resolution across models; differences in near-surface transport simulation appear to play a major role. Examples are also presented where there is evidence that the models show useful skill in simulating seasonal and synoptic changes in diurnal amplitude.
If you can't find a tool you're looking for, please click the link at the top of the page to "Go to old article view". Alternatively, view our Knowledge Base articles for additional help. Your feedback is important to us, so please let us know if you have comments or ideas for improvement.
 Predictions of future climate change rely on our ability to predict the uptake of fossil CO2 from the atmosphere to the biosphere and oceans. These CO2 fluxes can be inferred from atmospheric measurements of CO2. Our global knowledge of atmospheric CO2 concentration is derived from two sources: laboratory-based measurements of flask samples that are typically collected at weekly to 2-weekly intervals under “clean air” conditions and in situ instruments that sample the atmosphere continuously. In situ instruments are relatively expensive to set up and maintain but the synoptic and diurnal variations that their records capture are an important complement to the more extensive flask network. Whereas flask collection is intended to capture baseline air and therefore see mostly global and continental-scale variations in CO2, the in situ records sample the full variability of atmospheric concentration and therefore contain much more information about local and regional CO2 fluxes, especially on continents.
 Any estimation of regional CO2 fluxes from atmospheric measurements requires a model of the transport of CO2 from the input region to the sampling locations. The transport is driven by analyzed or climate model winds, and subgrid-scale transport is usually parameterized. Over the last decade a collaborative group, TransCom, has worked to compare transport models and assess the impact of differences in transport on inverted CO2 fluxes. Initially, a simple forward experiment compared monthly and annual concentrations from fossil and biosphere CO2 fluxes [Law et al., 1996]. From the experience gained in that experiment a more ambitious intercomparison was undertaken in which annual mean, seasonal cycle, and interannual inversions were compared [Gurney et al., 2002, 2004; Baker et al., 2006]. In each case the CO2 concentrations used were, at most, monthly. Also little, if any, nonbaseline data were included.
 Recently, individual groups have begun to try and include continuous data in their inversions. Law et al.  and Law  use synthetic CO2 data to explore methodological issues with inversions of continuous data. Methodology is also developed by Peylin et al.  in a 1 month test inversion for Europe using daily average CO2 from six locations. Rödenbeck  highlights the issue of data selection and data weighting in the use of in situ data. In particular, he uses only afternoon data from continental sites. Both Rödenbeck  and Peylin et al.  note the value of using time series of CO2 concentration derived from the prior fluxes input to the inversion to assess whether any given synoptic feature is likely to be well simulated by the transport model. This emphasizes the demand that this type of inversion places on the transport model; unless the model is able to reliably model CO2 concentrations on the timescales that the data is input, transport error will be propagated into the flux estimates.
 We need better assessments of whether this need for reliable model transport at synoptic and diurnal timescales can be met and, if not, how any shortcomings can be remedied. The challenge is to do this in a systematic and automated manner given the variety of modeling requirements for different sites. For example, at continental sites a critical issue is vertical mixing and whether nighttime data should be discarded because the boundary layer gets too shallow to model. At coastal sites the requirement is to correctly predict offshore or onshore flow whether from sea breeze circulations or synoptic systems. At mountain sites, we need to test which model level is most appropriate for the altitude of the site given model terrain resolution limitations.
 Global transport models have been used to assess synoptic variations at sampling sites [e.g., Heimann et al., 1989], but more recent studies have tended to focus on regional models with higher horizontal resolution. For example, Brandefelt and Holmén  modeled winter CO2 at Ny-Alesund in the Arctic, Chevillard et al.  simulated CO2 for European and Siberian sites for July 1998, while Geels et al.  simulated synoptic variations at two North American and two European sites from 1990–1998. A comparison of mostly regional models has also been performed for European sites [Geels et al., 2007]. Here we present results from a comparison of mostly global models. This is particularly timely as many global models are now being run with horizontal resolution of 1–2°. The experiment was coordinated by the TransCom group with the aim of better understanding the behavior of transport models at synoptic and diurnal timescales.
 This paper presents an introductory description of the exercise. The experimental protocol is described in section 2, including the complete set of input fluxes, the participating models are described in section 3, and the data processing and observations are described in section 4. In section 5 a subset of the modeled tracers are used to compare modeled and observed diurnal cycles. We first provide an overview of modeled summer diurnal amplitude at a wide range of sites and then provide illustrative examples of other aspects of modeled diurnal cycle behavior at a small number of sites. An overview of synoptic variations from the experiment is presented by P. K. Patra et al. (TransCom model simulations of hourly atmospheric CO2: Analysis of synoptic scale variations for the period 2002–2003, submitted to Global Biogeochemical Cycles, 2007).
2. Experiment Description
 The experiment was designed to be relatively simple to encourage maximum participation. The transport models were run at their host institution with model output submitted to a central ftp server. The models were run for 2 years, 2002 and 2003, preceded by 2 years for spin-up, for nine tracers using prescribed surface fluxes. The tracers were biospheric CO2 (five variants), fossil CO2, ocean CO2, SF6, and radon-222. The non-CO2 tracers provide useful diagnostics of atmospheric transport. The simulations were initialized with zero concentration throughout the atmosphere. Full details are given in the experimental protocol [Law et al., 2006a].
 The surface fluxes for the biospheric CO2 tracers were provided by the SiB3 and CASA [Randerson et al., 1997] process models. The SiB model was forced with NCEP2 meteorology data [Kalnay et al., 1996] and GiMMSg NDVI [Brown et al., 2004] and was run for 2000–2003 repeatedly to give an adequate spin-up period. A final run is used to ensure a zero net annual carbon flux [Denning et al., 1996]. The SiB fluxes (I. Baker et al., Global net ecosystem exchange (NEE) fluxes of CO2, unpublished data, 2005) were used at hourly (SiB), daily (SiB_day), and monthly (SiB_mon) resolution. The daily and monthly fluxes were created by averaging the hourly fluxes. The input fluxes were taken to be applicable at 30 min past the hour, at 1200 UT, and at midmonth, respectively.
 The CASA fluxes were used at 3-hourly (CASA) and monthly (CASA_mon) resolution and have a zero annual mean flux everywhere. The monthly fluxes [Randerson et al., 1997] are the same as those used in the TransCom 3 inversions [e.g., Gurney et al., 2003]. The 3-hourly fluxes were generated by adding diurnal variability to the monthly fluxes of gross primary production (GPP) and respiration. The variability was generated using 2 m temperature and surface short-wave radiation from the ECMWF analyses (http://www.ecmwf.int/research/ifsdocs/CY28r1/index.html) at 1° × 1°. The respiration was then rescaled to maintain the same monthly average flux as the original net ecosystem production (NEP) as described by Olsen and Randerson . The 3-hourly fluxes were taken as applicable to 130 UT and each 3 h following. The monthly fluxes were taken as midmonth values.
 Fossil fuel emissions (fossil98) were kept constant throughout the simulation. Spatial emission patterns were taken from the EDGAR 1° × 1° map for 1990 [Olivier and Berdowski, 2001], scaled on a country level to emission totals for 1998 given by an earlier version of Marland et al. .
 The spatial distribution of SF6 fluxes was taken from the EDGAR-95 emissions database (http://www.rivm.nl/edgar) and was scaled by prescribed annual emissions. The annual value was taken to be applicable for the middle of the year with linear interpolation used between midyears. The annual emission totals were chosen to match the global growth rate of SF6 as defined by observations from the NOAA ESRL Cooperative Air Sampling Network similar to Peters et al. .
 Each of the preceeding sets of fluxes were provided to modelers at 1° × 1° or 0.5° × 0.5° resolution. Each modeler was responsible for any regridding that was required for their model resolution. In performing the regridding, it was recommended that each model's land mask was used to ensure that land fluxes were restricted to land grid points and ocean fluxes to ocean grid points but not all models followed this recommendation. Following the regridding, the global total flux was adjusted to match the total flux on the original grid, thus ensuring the same total input flux was prescribed in each model. In most models the prescribed fluxes were linearly interpolated in time to each model time step; a small number of models kept the fluxes fixed for the hourly/3-hourly fluxes.
 The radon surface flux was not provided as a gridded field. Each modeler created their own radon flux field based on their model land mask using values of 1.66 × 10−20mol m−2s−1 for land equatorward of 60°, 8.30 × 10−23mol m−2s−1 for ocean equatorward of 70° and land between 60°–70° and zero poleward of 70°. The radon flux was kept constant throughout the simulation. Radon in the atmosphere was decayed with a half-life of 3.8 days.
2.2. Model Output
 For each tracer, hourly concentrations were submitted for 280 sampling locations. The site list is given in Appendix 1 of the protocol [Law et al., 2006a]. The list of locations was chosen to include current and proposed in situ and flask sampling sites for CO2. Modelers chose how to sample their model to provide the required data. Most provided the nearest model grid point to the given location while others used linear interpolation between grid points either horizontally or vertically or both. For high-altitude sites, most modelers chose a nonsurface model level. For coastal sites, modelers were asked to submit an offshore and onshore grid point.
 Additional information was submitted for 100 locations [Law et al., 2006a, appendix 2], most of which were also in the list of 280 sites. This list included locations where in situ CO2 data were available for 2002 and/or 2003. Tracer concentrations, u and v component winds, and pressure were saved for all model levels up to approximately 500 hPa. Surface pressure, cloud cover, planetary boundary layer height, and surface trace gas flux were also saved. Separate files were submitted for each tracer and year.
3. Participating Models
 Twenty-five models or model variants submitted data to the experiment. The model variants mostly involved a change in resolution but sometimes included a change in forcing meteorology. The models are listed along with some of their key characteristics in Table 1. The models are divided into two broad groups, online and offline models. In online models (indicated in bold), the tracer transport occurs within a full global climate model. In these cases, the model meteorology is generally kept close to analyses by nudging wind (e.g., AM2, CCAM) and sometimes temperature (e.g., AM2t, CCSR_NIES1/2) toward the analyzed values. In some models, nudging is not performed in the lowest model levels. By contrast, offline models simulate tracer transport only and take analyzed winds as input. Other meteorological variables or mass fluxes are usually input in order to simulate subgrid-scale tracer transport.
Bold font indicates online models, and italic font indicates that the model was run online for meteorology and tracers were run offline as a second step.
AIST: National Institute of Advanced Industrial Science and Technology, Japan; CCSR: Center for Climate System Research, Japan; CSIRO: Commonwealth Scientific and Industrial Research Organisation, Australia; CSU: Colorado State University, USA; ECMWF: European Centre for Medium-Range Weather Forecasts; ECN: Energy Research Centre of the Netherlands; FRCGC: Frontier Research Center for Global Change, Japan; GFDL: Geophysical Fluid Dynamics Laboratory, USA; GSFC: NASA Goddard Space Flight Center, USA; JMA: Japan Meteorological Agency; LLNL: Lawrence Livermore National Laboratory, USA; LSCE: Laboratoire des Sciences du Climat et de l'Environnement, France; MPIBGC: Max-Planck-Institute for Biogeochemistry, Germany; NERI: National Environmental Research Institute, University of Aarhus, Denmark; NIES: National Institute for Environmental Studies, Japan; ESRL: NOAA Earth System Research Laboratory, USA; SRON: Netherlands Institute for Space Research;
Longitude × latitude or distance or spectral resolution indicated by T (triangular) maximum wave number.
σ vertical coordinates are pressure divided by surface pressure, η vertical coordinates are a hybrid sigma-pressure coordinate, z* is a terrain-following coordinate z* = zT(z − zs)/(zT − zs) where zs is surface height and zT is height of the top of the model domain.
Approximate height above the surface of the midpoint of the lowest model level.
Temporal resolution of the input meteorology fields, model time steps are much shorter.
Tracer is mixed through the first four levels of this model so the effective first level height is 52 m.
 All the models are global except for CHIMERE, COMET, DEHM, and REMO. CHIMERE is an Eulerian mesoscale model, which has been run for western Europe at 50 km resolution with boundary conditions from LMDZ. COMET is a Lagrangian model and only simulates two levels. The lower level represents the planetary boundary layer while the upper level represents the rest of the atmosphere. All submitted data were taken from the lower level and hence will not be representative of high-altitude sites. COMET simulates concentration deviations from background concentrations, and for this experiment its domain has been restricted to Europe. DEHM has been run for the Northern Hemisphere at 150 km resolution with a 50 km nested region over Europe. TM3_vfg has been used to provide initial and boundary conditions for DEHM. REMO is run in forecast mode with respect to meteorology (consecutive 1-day forecasts starting from analyses) in combination with continuous tracer transport. It was run for 30–90°N.
 Initial quality checks were performed on all submissions, and some submissions were revised when errors in the simulations were found. Annual mean concentrations were calculated for all tracers, and the resulting distributions with latitude were similar to those found in previous TransCom comparisons [Law et al., 1996; Gurney et al., 2003]. The interhemispheric gradient can be compared with observations for SF6. We have calculated the Northern Hemisphere minus Southern Hemisphere concentration difference using a spline fit to the 2002 mean concentration at marine boundary layer sites for the 21 global models and observations. The modeled interhemispheric difference ranges from 0.21–0.55 ppt with a mean of 0.28±0.07 ppt compared to 0.23 ppt for observed SF6 in 2002. Two models lie outside the one standard deviation range: NIES05 (0.37 ppt) and CDTM (0.55 ppt). In both cases, weak vertical mixing is thought to be the cause of the increased gradient. In CDTM, the thick surface layer may contribute to the weak vertical mixing.
 We also checked the seasonal behavior of models, particularly for the biospheric tracers. We confirmed that seasonal cycles at remote marine sites were the same regardless of the temporal frequency of the biospheric fluxes; that is, monthly mean fluxes gave the same results as hourly or 3-hourly fluxes. We also compared seasonal amplitudes with those observed for Northern Hemisphere sites where the biospheric flux dominates this signal. We find that the simulations with SiB fluxes overestimate the peak-to-peak amplitude of the seasonal cycle (21.0 ± 3.4 ppm compared to 14.2 ppm observed for the average of 13 marine boundary layer sites between 45–90°N), while those with the CASA fluxes slightly underestimate the amplitude (12.8 ± 2.0 ppm). For both the SiB and CASA case, the REMO amplitude is just below one standard deviation from the model mean, while AM2t, CCAM, and STAGN amplitudes are above the mean plus one standard deviation. Overall, most model behavior for the large scales appears to be realistic but this may not be indicative of how a model will perform for the shorter timescales of interest here. For this reason we have not excluded any model from further analysis on the basis of its large-scale performance.
 The submitted data offer a multitude of possibilities to analyze model behavior, including various different target timescales (such as diurnal, synoptic, seasonal, and time-mean). As a starting point, this paper focuses on diurnal variations, and we choose to use a subset of the tracers at a subset of sampling sites. The sites we consider are those with continuous observations available during 2002. We located 48 sites with calibrated CO2 data available for 2002 or 2003. Most of these were sourced from the World Data Centre for Greenhouse Gases (http://gaw.kishou.go.jp/wdcgg.html) with additional sites through personal contact with the scientists responsible for those sites. In Table 2 we list the subset of these sites that we use in this paper because they have a significant diurnal cycle and sufficient data coverage through the period of interest. The observations are typically hourly averages which have been selected to remove obvious analytical errors but not selected for meteorological conditions. Therefore the records may contain a mixture of samples that are representative of regional fluxes and samples that are strongly influenced by local fluxes. One of the challenges of using in situ data is to determine which parts of the record can be reliably modeled. Here we will use the model simulations to illustrate the range of issues that need to be considered when trying to compare models and observations at hourly timescales. Our initial use of the observations is rather indiscriminate; the aim is to illustrate general issues in model-observation comparisons rather than to give the best comparison at any given site.
Table 2. Location of in Situ CO2 Sites Used for Assessing the Diurnal Cycle
AERC: Aichi Environmental Research Center, Japan; AIST: National Institute of Advanced Industrial Science and Technology, Japan; CAMS: Chinese Academy of Meteorological Sciences; CESI RICERCA: Italian Electrical Experimental Center; CESS: Center for Environmental Science in Saitama, Japan; ECN: Energy Research Centre of the Netherlands; FEA: Federal Environment Agency, Austria; FMI: Finnish Meteorological Institute; Harvard: Harvard University, USA; HMS: Hungarian Meteorological Service; IAFMS: Italian Air Force Meteorological Service; JMA: Japan Meteorological Agency; KMA: National Institute of Meteorological Research, Korea Meteorological Administration; LSCE: Laboratoire des Sciences du Climat et l'Environnement, France; MSC: Meteorological Service of Canada; NIES: National Institute for Environmental Studies, Japan; NOAA: National Ocean Atmosphere Administration, USA; Scripps: Scripps Institution of Oceanography, USA; UBA: Umweltbundesamt, Germany.
These sites were not included in the “allsite” list but were included in the “contsite” list for which vertical profiles were submitted.
Sampling occurs at multiple levels on a tower. The altitude given is the surface. Samples are taken at 20, 60, 120, and 200 m at CBW, at 10, 48, 82, and 115 m at HUN, at 11, 30, 76, 122, 244, and 396 m at LEF, at 1, 3, 10, 20, 29, 39, 51, and 62 m at TPJ.
This site was not included in either list of sites to be submitted but can be represented by the surface layer submission for DDR.
These sites were included in the “allsite” list but not the “contsite” list.
The latitude/longitude for Westerland should be 54.92°N, 8.31°E but was incorrectly listed in the input files for the experiment.
 As we are only concerned with diurnal variations here, we have removed the trend and seasonal cycle from both the modeled and observed time series, C, to produce a time series of concentration residuals, Cresid.
and t is time in years (i.e., from 0 to 1) and an are the constants determined by the fit. The fit was performed separately for each model and location. For comparison with the observations we use the sum of three of the modeled tracers, namely CASA (3-hourly), fossil98, and Taka02. While this does not provide a complete representation of CO2 exchange with the atmosphere, it does capture the major fluxes that we expect to produce most of the atmospheric variation of CO2.
 There are a number of general issues that we need to be aware of when comparing model and observed data. Model time series are complete, and universal time is used. Observed time series have missing values and may have been provided in local or universal time. For the models, the hourly values may be a snapshot taken at a single model time step or an average across model time steps, while the observations are usually averages of multiple samples. At some sites it was not always clear whether the time stamp for the observations was the start, middle, or end of the hour for which the observations had been averaged. At this stage we have ignored this difference in averaging hour.
5. Results for Diurnal Variations
 Many of the observational records contain large diurnal cycles, particularly in summer. Diurnal variations can occur because of diurnal variations in fluxes as well as diurnal variations in meteorology and can be amplified by the covariance of both. They are thus a useful test for aspects of model behavior such as boundary layer processes. There are a number of features of the diurnal cycle that can be explored using the model output: the amplitude and phase of the mean diurnal cycle as well as the variability of daily amplitudes and seasonal changes in amplitude. We have chosen here to focus first on diurnal amplitude and compare modeled and observed amplitudes at a wide range of sites. We show how part of the model spread can be explained through sampling choices. We then present other aspects of the diurnal cycle through examples at a small number of sites. These are intended to illustrate how the data set could be used for more detailed analysis in future work.
 We begin our analysis here with the mean diurnal cycle in summer, since this is when diurnal cycles are largest and any model differences will be most evident. We consider only the Northern Hemisphere summer (June–August). There is one Southern Hemisphere site (TPJ) located in the interior of a continent and, being tropical, it has a large diurnal cycle all year. The concentration residuals for June–August are averaged for each hour of the day to produce the mean diurnal cycle.
5.1. Amplitude of the Mean Summer Diurnal Cycle
Figure 1 shows the amplitude of the mean diurnal cycle for most Northern Hemisphere sites and TPJ. We do not show remote/island sites where any observed diurnal cycle must be from local sources and meteorology, which would not be resolved by the models. We confirmed that these remote sites had zero or close to zero modeled diurnal amplitude. We only show a selection of sampling heights for locations with towers. The sites are grouped into those with altitudes greater than 800 m, ordered by altitude, and those below 800 m, ordered by the magnitude of the observed diurnal amplitude. The model results presented here are derived from the submitted files in which each modeler chose where to sample their model grid for that site. We will show that those sampling choices account for some of the model spread seen in Figure 1. A similar figure (not shown) prepared with the SiB (hourly) fluxes rather than the CASA fluxes gave slightly smaller diurnal amplitudes in general but similar model spread to the CASA case.
 Two groups of sites are particularly vulnerable to “sampling spread”, mountain and coastal sites. For coastal sites (marked with an asterisk on the x axis), modelers were requested to provide onshore and offshore samples. Here we have plotted the average amplitude from these two submissions. Note that for any particular model, this may not give the best comparison with the surface observations. For mountain sites, some modelers chose to submit surface layer data while most chose to sample from a model level that they considered representative for that site. For the sites with altitudes above 800 m, we have shown surface layer amplitudes with an open circle rather than the usual diagonal cross. For most mountain sites (e.g., CMN, PRS), this shows a clear split between the surface layer amplitudes, which are larger than observed and nonsurface samples, which give lower amplitudes than observed. The only marked site for which there is an overlap between the surface and nonsurface samples is DDR with an altitude of 860 m. Here other model differences also contribute to the amplitude range. The selection of an appropriate model level to represent a mountain site is not easy and we discuss this further in section 5.1.2.
 Sampling choices in the horizontal also contribute to the spread in model amplitudes. Some modelers chose to interpolate to the site location while others selected the nearest model grid point. In regions where the surface fluxes have large spatial heteorogeneity, the grid box sampled by each model might contain quite different surface fluxes despite the same flux field being prescribed. These different surface fluxes explain much of the model spread at sites such as MKW in Japan (see section 5.1.1) and NGL in Germany.
 At most sites, the observed amplitude lies in the middle of the modeled range but there are a few exceptions. At NGL and ZGT in Europe and LEF011 in America the modeled amplitudes are almost all smaller than observed (for SiB as well as CASA fluxes) while at TPJ and HVF in America almost all models overestimate the observed amplitudes. These differences could indicate that the input fluxes (biospheric and/or fossil) were not representative for these sites but the representativeness of the comparison data is also an issue in some cases. For example, we tested the sensitivity of the observed diurnal amplitude to wind speed by excluding data for the lowest 10% of wind speeds. At NGL this reduced the observed diurnal amplitude from 43 to 32 ppm. At LEF and TPJ, concentration data are available at a number of heights and it is not always easy to determine which heights are best compared with which model levels. At LEF the 11 m level is hard to simulate since most models do not have sufficient vertical resolution near the ground. At TPJ, the amplitude at 62 m (23 ppm) is shown in Figure 1 but the amplitude at 29 m is 62 ppm, which would be at the upper end of the range of modeled amplitudes. The TPJ simulations are analyzed further in section 5.2.
 Although there is large model spread shown in Figure 1, this does not mean that we have little model skill in simulating diurnal amplitudes; most give a reasonable simulation of the relative amplitude across sites. This can be seen in Figure 2, which shows the linear fit between modeled and observed diurnal amplitude for the sites below 800 m for each model. Most models overestimate low-amplitude sites and underestimate high-amplitude sites. Coastal sites are responsible for some of the overestimates. The regional models DEHM and REMO give the closest match to the one-to-one line while COMET (also regional but only two layers), and CDTM and IMPACT give the worst agreement. The relatively high modeled amplitudes for COMET (a Lagrangian model) can be explained by the assumption in this version of the model that all receptor points are in the surface (PBL) layer, which may be lower than the measurement level. The scatter around the fitted line is indicated by the R2 value (given as part of the model label) and ranges from 0.17 to 0.60. For some models the low R2 is due to one or two sites with large model-observed mismatches. For example, R2 for IFS increases from 0.43 to 0.75 if HVF and TPJ are removed from the fit.
5.1.1. Horizontal Sampling Case Study
 Mikawa-Ichinomiya (MKW) is located in central Japan about 60 km southeast of Nagoya in a suburban area surrounded by mountains to the north and northeast. Figure 3 shows modeled and observed mean diurnal cycles for summer (JJA). Since the seasonal cycle was fitted and removed first, each plotted diurnal cycle is centered on zero. In general, the shape of the diurnal cycle produced by the models agrees with that observed, with increasing concentrations through the night and relatively uniform, low concentration during the day. However, the amplitude varies widely, as was seen in Figure 1.
 The diurnal amplitude in concentration is dependent on the input fluxes for each model (indicated by the line color and style in Figure 3). In general, models with smaller input fluxes (e.g., blue lines) give smaller diurnal amplitude. This is true of both biospheric and fossil components (solid lines of a given color tend to show smaller amplitudes than dashed lines of the same color). At this location the constant fossil flux contributes around 30–40% of the diurnal amplitude due to trapping of the fossil signal at night compared to the day. The variation in input fluxes between models is due to the choice of sampling location and how the input fluxes were regridded in each model to those locations. While most models cluster around the site location, those with lower horizontal resolution are 2–3° away, and three models sampled locations that were predominantly ocean. Four models that sampled locations south of the site used zero biospheric flux because of the proximity of the coast, which was clearly inappropriate for this site.
 Many sites show evidence of similar sampling issues to those found for MKW. Problems occur not only for sites in coastal areas but also for sites in regions with heterogeneous fluxes. Sampling location information has been provided for each model submission and should clearly be considered in any comparisons that are made. The horizontal resolution of a model determines both the choice of grid points available to represent a site as well as the resolution of input fluxes. The higher the model resolution, the more likely it is that a site can be appropriately represented.
5.1.2. Vertical Sampling Case Studies
 In Figure 1 we found that the amplitudes at mountain sites were underestimated when modelers submitted a nonsurface model level that they thought was representative of the sampling altitude or overestimated if the surface level was submitted. Mountain sites have often been chosen for CO2 sampling because they provide clean records with less contamination by local sources. However, in a global transport model, this usually means the site is not representative of the grid cell within which it is located; usually, the model topography is lower than the site altitude, and the gridded fossil and biospheric fluxes input to the model may also be larger than would be expected for a sparsely vegetated mountain location. The usual solution has been to sample a nonsurface model level, but choosing an appropriate level is difficult. This has been noted before [e.g., Geels et al., 2007].
 Here we are able to explore the decay in diurnal amplitude with height since data were submitted for all model levels to around 500 hPa. We also expect the phase of the diurnal cycle to lag as we move away from the surface. Figure 4 shows amplitude plotted against a measure of the phase for two sites, CMN and SNB, at each model level with amplitude greater than 1 ppm. For the measure of phase we use the time when the concentration changes from positive to negative (the diurnal cycles are centered around zero because the mean seasonal cycle has been removed). This typically happens during the morning as high nighttime concentrations are mixed through the atmosphere and daytime photosynthesis reduces concentrations. Each line shows the results from one model. Models typically show high amplitudes and low zero crossing times at model levels close to the surface and smaller amplitudes and later times aloft. Surface amplitudes vary substantially between models, in part due to model vertical resolution.
 While the model behavior is broadly similar between models and across sites, the observations are not always consistent with any model level. At Mt Cimone (CMN), the zero crossing is much earlier than for that amplitude in any model. The observed diurnal cycle also indicates more uniform nighttime concentrations than the modeled cycles which tend to increase in concentration through the night. For this site, it is clear that the diurnal cycle is not helpful for selecting a model level to compare with observations. It would also indicate that comparisons with observations should only be made for daily or longer time averages, and possibly for only part of the diurnal cycle, e.g., daytime or nighttime.
 The Sonnblick (SNB) observations show a consistent amplitude and phase with at least some of the models. From the submitted meteorological data for each model, we checked the pressure of the model levels where there was reasonable consistency with the observations. The typical best model level was around 800–850 hPa compared to around 700 hPa for the altitude of SNB. This suggests that the optimum model sampling level is lower in the atmosphere than the altitude of the site would suggest. The seasonal cycle also decays with height. A useful extension to this study would be to test whether the best model level for sampling the diurnal cycle also gives an acceptable simulation of the seasonal cycle.
 To a lesser degree, the choice of sampling level also contributes to the model spread in Figure 1 for lower-altitude sites (500–1500 m). For example, at PAL (560 m), amplitudes calculated from surface layer concentrations were generally too high while those from levels chosen to represent the site altitude were too low. As for the higher-altitude sites, an intermediate level would give the best match, assuming that the input fluxes in this area are realistic.
5.2. Day-to-Day Variability of Diurnal Amplitude
 One of the difficulties in comparing modeled and observed diurnal amplitude in concentration is knowing how well the model input fluxes represent the location where the comparison is being made. While we may need to be cautious about using flux tower measurements to compare with modeled grid box averages, flux towers do provide an opportunity where both fluxes and concentrations are “known”. Consequently it seems worthwhile to test how well the models can represent the mean diurnal cycle at these sites and any variability in the diurnal cycle.
 We examine two sites, one in the tropics (TPJ) and one at high latitude (BOR). Figure 1 showed that modeled diurnal concentration amplitudes at TPJ were almost always larger than observed, while at BOR, most models gave a smaller amplitude than observed. It does not appear that these differences can be attributed to the input fluxes. At TPJ, estimates of net ecosystem exchange (NEE) [Hutyra et al., 2007] show a mean diurnal cycle of JJA fluxes very similar to that input to the model with a peak-to-peak flux amplitude of 25 μmol m−2s−1 compared to 19–28 μmol m−2s−1 (median 23) for the models. There is a small phase difference between the model input fluxes and those observed; the observed fluxes decrease rapidly after 1000 UT but this decrease begins 1–2 h earlier in the model input fluxes. At BOR the flux tower NEE [Dunn et al., 2007] for JJA shows similar fluxes to those input to most models during the night and slightly smaller (less uptake) fluxes during the day. The peak-to-peak amplitude from the flux tower is 10 μmol m−2s−1 compared to 7–14 μmol m−2s−1 (median 12) for the models.
 The difference in mean modeled and observed diurnal amplitude in concentration, despite the reasonably good representation of the local fluxes, suggests that part of the variability of diurnal amplitude may be poorly simulated. For example, the models may struggle to simulate the large diurnal amplitudes in concentration observed under very stable conditions. Figure 5 shows the cumulative distribution function (cdf) of diurnal amplitude for June–August for observed and modeled CO2. The amplitude for each day was determined by taking the difference in concentration between the times of minimum and maximum concentration in the annual average diurnal cycle. This method was chosen to try and avoid interpreting synoptic changes as diurnal ones. The cdf plots the proportion of daily amplitudes less than a given value, with the shape of the line giving an indication of any skew in the distribution.
 Positive skew is seen in the distribution of observed diurnal amplitude at the high-latitude site, BOR (Figure 5a), with a rapid increase in amplitude (from 30–80 ppm) in the top 15% of the distribution. All but two models (CCSR_NIES2 and DEHM) also produce an amplitude distribution with positive skew, but for most models the amplitudes are smaller than those observed. The underestimate occurs through most of the distribution, indicating that problems with the model simulation are not confined to difficulties simulating very shallow nocturnal boundary layers. The models giving diurnal amplitudes closer to those observed (e.g., IFS, NIES05, REMO, and TM5_nam1x1) are those with higher horizontal resolution; two other models (CCSR_NIES2 and DEHM) mostly give amplitudes larger than those observed but poorly represent the shape of the distribution, failing to simulate any of the very large amplitudes. The IFS model produces larger amplitudes than observed at the upper end of the distribution, presumably because of its shallow surface layer.
 Given the relatively good performance of some models at this site and before discussing the tropical site in Figure 5, it is worth checking whether the timing of the variations in diurnal amplitude are also reasonably simulated at BOR. Figure 6 shows the diurnal amplitude for 25 days in July and August 2002 for BOR. The models capture the low-amplitude period from day 211–216 followed by the high-amplitude period through to day 223. The diurnal flux amplitude is large through both periods indicating that transport dominates the concentration amplitude variability at this time. The rapid decrease in amplitude at day 224 in both the observations and the models coincides with a large decrease in flux amplitude on that day (in both observed and CASA input fluxes).
 The correlation (r) between modeled and observed diurnal amplitude for the whole JJA period for BOR is between 0.30 and 0.71 with almost half the models greater than 0.6. The models with the larger correlations are mostly, but not exclusively, those that gave diurnal amplitudes closer to those observed. TM3_fg is an example of a low-amplitude, high-correlation model (r = 0.65) while DEHM is an example of a high-amplitude, low-correlation model (r = 0.37). CDTM and PCTM.CSU also give correlations less than 0.4. For this site, the models capture at least some of the synoptic variations in observed diurnal amplitude. It would be useful to check whether this is also the case for other midlatitude and high-latitude sites.
 The diurnal amplitude of observed CO2 does not seem to be as well modeled in the tropics as at high latitude. Figure 5b compares modeled and observed cdfs of June–August diurnal amplitude at TPJ in Brazil. The observations are shown for three tower heights, 20, 29, and 62 m, but the modeled distributions do not seem to compare well with any of these heights.
 The shape of most of the modeled distributions is closest to the 62 m case (leftmost red line) which is probably reasonable since this height is above the tree canopy and therefore more likely to represent conditions that the global models could simulate. However, the models generally produce larger amplitudes throughout the distribution than observed at the 62 m level. It is unlikely that the larger amplitude can be attributed to a mismatch in the observed and modeled sampling height (typically 30–50 m) since observed distributions of diurnal amplitude from two other tower heights at 39 and 50 m (not shown) are very similar to the 62 m distribution. The overestimated amplitude also does not appear to be due to problems with the phasing of the diurnal cycle such as might occur with delayed venting of the nocturnal boundary layer. Maximum modeled concentrations mostly occur slightly earlier than those observed at 20 and 29 m, with the 62 m observations lagging by an additional 1–2 h. The early modeled maximum is consistent with the earlier decrease in model input fluxes compared to those observed.
 A number of models have higher amplitudes at the bottom end of the distribution than the 20 m observations but only one model (IFS) has amplitudes exceeding the 20 and 29 m levels at the top of the distribution. Its distribution lies between the 20 and 29 m observed distribution for most of the amplitude range, suggesting that model performance can be improved with increased vertical resolution near the surface. The poor representation of the diurnal amplitude variability, as seen in the comparison of distributions, is confirmed when correlating observed and modeled diurnal amplitudes for each day from June–August. Most models, including IFS, give correlations below 0.2 regardless of which level the observed CO2 is taken from. Clearly, further work is needed to establish whether global model results can be downscaled to this type of observing site.
5.3. Influence of Model Vertical Resolution
 For a given diurnal cycle of flux, we might expect the resulting concentration signal in the lowest model level of the atmosphere to depend on the thickness of that surface layer; a thick surface layer would give a smaller amplitude cycle of concentration than a shallow surface layer. To test this, we have chosen seven continental, low-altitude sites. Figure 7 shows JJA diurnal peak-to-peak amplitude of concentration divided by a measure of the input flux at that location. The value we use is half the peak-to-peak amplitude of the CASA flux plus the fossil flux. The flux varies between models depending on how the prescribed fluxes were regridded and where each modeler chose to sample their model to represent the given site.
 The results show that most models give similar ratios of 2–4 ppm/μmol m−2s−1. CDTM gives lower ratios than the other models while DEHM, IFS, REMO, and the TM5 models give slightly higher ratios. The thickness of the surface layer can explain the CDTM and IFS result since these models have the largest and smallest surface layer thickness, respectively. However, for models with intermediate thicknesses (midlevel height of 30–80 m), any relationship between the concentration-to-flux ratio and surface layer thickness appears to be weak. For example, there is a relatively small difference between the ratios in the two LMDZ models despite LMDZ_THERM having twice the vertical resolution of LMDZ. Presumably, differences in how near-surface mixing is simulated between models are as important as the vertical resolution.
 The range of ratios across sites varies between models with some models showing a large spread and others very little spread. Across most models there is a tendency for TPJ to give high ratios and FRD to give low ones. This seems surprising since we might anticipate more rapid vertical mixing in the tropics than at high latitudes and hence a lower concentration-to-flux ratio at TPJ than FRD. However, this appears to be a misconception. We checked the modeled boundary layer height for June–August at TPJ and FRD for a number of models and found that the TPJ heights were often lower than at FRD, especially at nighttime. This would explain the higher ratios found for TPJ.
5.4. Seasonal Variation of Diurnal Amplitude
 The seasonal variation of diurnal amplitude may be a useful diagnostic of seasonal changes in CO2 exchange with the biosphere. At middle and high latitudes, diurnal amplitudes will be larger during the growing season, while in the tropics the difference in wet and dry seasons may be detectable. In Figure 8 we show the seasonal change in diurnal amplitude for 2002 at four sites, chosen because they show interesting differences between the models and observations. Since we have already found that the modeled amplitudes can vary widely at a given site, we plot the amplitude of the monthly mean diurnal cycle relative to the mean of the 12 monthly amplitude values for each model and refer to this as the normalized amplitude. The observations are treated likewise. At Fraserdale (Figure 8a), the models show a rapid increase in normalized amplitude between May and June and a rapid decrease between August and September. In both cases the timing of the change is earlier in the models than the observations. The maximum amplitude occurs in July or August depending on the model. The models tend to overestimate the normalized amplitude in these months compared to the observations.
 The observed seasonal change in diurnal amplitude is quite different at Mace Head (Figure 8b), with two peaks in May and September. Since Mace Head is a coastal site, it is likely that the monthly change in amplitude has a meteorological component in addition to any changes in the surface flux. The lower amplitudes in June and July would presumably indicate a greater incidence of baseline (oceanic) conditions compared to May and September. For the model results, simulated concentrations were submitted for an onshore and offshore grid box. For most models the offshore submission gave smaller absolute diurnal amplitudes than those observed, while the onshore submission gave larger amplitudes than observed. Here we have chosen to average the MHD and MHDOCN submissions before calculating the normalized amplitude shown in Figure 8b. The models generally show a three-peak rather than two-peak structure, with many models giving relatively high amplitudes in July, unlike the observations. Where models do give lower values between June and August, this is due to the contribution from the MHDOCN submission, which would tend to confirm the role of meteorology in determining the diurnal amplitude at this site.
 Neuglobsow (Figure 8c) is situated by a lake in a forest environment in northern Germany. Figure 8c shows a broad maximum amplitude from June to August in the observations and a double peak in amplitude in May and August in most models. The diurnal amplitude of the CASA input fluxes is largest in May, consistent with the maximum in the modeled amplitudes. By contrast, the August diurnal amplitude in the CASA input fluxes is slightly smaller than for June or July, so the modeled maximum in August is harder to explain. Analysis of one model (IFS) suggests that lower wind speeds and shallower boundary layers in August, compared to June–July, may be responsible. The relative overestimate of the amplitude in May is a feature that is seen at many of the European sites, which suggests that the growing season starts too vigorously in the CASA input fluxes throughout this region. This may be due to how the NDVI data are used in determining the CASA fluxes. The May peak is not found when the CASA tracer is replaced by the SiB one.
 The final panel in Figure 8 shows the normalized diurnal amplitude at the tropical site, Tapajos. Here there is less seasonal variation in amplitude through the year, and the year shown here, 2002, may not be representative of all years. While there is some scatter in the model results, there is a tendency for lower amplitudes around February and October–November and larger amplitudes around May to August. This does not seem to clearly follow the seasons since the larger amplitudes occur in the transition from the wet to dry season. The observed normalized amplitude is shown for three sampling levels. The models agree reasonably well with the seasonal changes in observed diurnal amplitude at the 20 and 29 m levels but not with the 62 m level.
6. Further Work and Data Accessibility
 The analysis of the diurnal cycle presented here is clearly only a small subset of the analysis that could be performed with the data set generated by this experiment. We have not attempted to analyze all the submitted tracers nor all the submitted sampling locations. A second overview paper focuses on synoptic variations (P. K. Patra et al., 2007) but we also welcome other studies using this data set. Information on how to access the data is available on the TransCom Web site (http://www.purdue.edu/transcom/T4_continuousSim.php).
 The TransCom experiment presented here has generated a valuable data set for comparing modeled CO2 with in situ measurements. It has also become a useful benchmark test for modelers to assess model changes and has been responsible for identifying and fixing a number of model bugs and weaknesses. The analysis of the diurnal CO2 cycle has highlighted the importance of knowing where the transport model has been sampled to represent any given measurement site. Differences in sampling location and input fluxes between models accounted for some of the difference between model simulations. Plausible simulations of the observed diurnal cycle are only possible when the fluxes input to the sampled grid cell are realistic. For coastal sites or sites in regions of heterogeneous fluxes, this should be more achievable for transport models running with higher horizontal resolution. Correctly sampling a model to represent a site at moderate to high altitude remains a challenge. Our analysis showed that a model level somewhat lower than the true altitude of the site would usually improve the representation of the diurnal cycle, but the results were quite variable across models.
 Once differences in sampling locations and input flux have been accounted for, our analysis has shown that most models show similar strengths and weaknesses when compared with observations. None of the comparisons showed any obvious differences in the performance of online compared to offline models. Overall, the results suggest that more detailed analysis would be required to assess how current atmospheric models need to be improved (e.g., in their representation of vertical mixing) to allow the inclusion of the full diurnal cycle of CO2 observations in flux inversions. However, there is clearly valuable information in the diurnal records, e.g., the synoptic and seasonal changes in amplitude, for which we show some model skill and which may indicate a way forward.
 Maintaining continuous CO2 observation records requires dedicated principal investigators, research teams, and support staff. We wish to acknowledge all of this effort and thank those who made their data available for this study. Mt Cimone CO2 data were provided by the Italian Air Force Meteorological Service. CO2 measurements at many of the European locations including Hegyhatsal are sponsored by the CarboEurope project. Mace Head CO2 data are part of the ORE-RAMCES monitoring network coordinated by LSCE/IPSL. La Jolla and Trinidad Head CO2 data are from the group of C. D. Keeling, now run by R. F. Keeling. The Boreas measurements were supported by the U.S. National Aeronautics and Space Administration (NAG5-11154, NAG5-7534, NAG5-2253). An experiment such as this generates a large model data set. Many thanks to Kevin Gurney and the Department of Earth and Atmospheric Sciences at Purdue University for data handling and ftp hosting. Cathy Trudinger provided helpful comments on the manuscript. Individual modeling groups acknowledge the following support. CCAM: Part of this work was supported through the Australian Greenhouse Office. We thank John McGregor and Eva Kowalczyk for their development of CCAM. CHIMERE is a model developed by IPSL, INERIS, and LISA. Part of the implementation of CHIMERE-CO2 has been supported through the French Environment and Energy Management Agency (ADEME) and the French Atomic Energy Commission (CEA). DEHM: Part of the work has been carried out within the CarboEurope-IP project funded by the European Commission. IFS: the work has been funded by the EU's GEMS project SIP4-CT-2004-516099. LLNL: The LLNL portion of this work was performed under the auspices of the U.S. Department of Energy (DOE) by the University of California, Lawrence Livermore National Laboratory (LLNL) under contract W-7405-Eng-48. The project (06-ERD-031) was funded by the Laboratory Directed Research and Development Program at LLNL.