We evaluate the transport of three-dimensional chemical transport models in the upper troposphere and lower stratosphere applying observed distributions of CO2 and SF6. The data consist of high-resolution in situ observations, obtained during all seasons at subtropical, middle and high latitudes over Western Europe within the SPURT (Spurenstofftransport in der Tropopausenregion) project (2001–2003). We show that the combination of the two passive tracers SF6 and CO2 with their different tropospheric characteristics and the propagation of the temporal trends of these two gases into the lower stratosphere is a powerful diagnostic for evaluation of model transport. The model evaluation shows that all models are able to capture the general features in the tracer distributions including the vertical and horizontal propagation of the CO2 seasonal cycle. However, the modeled CO2 cycles are a few months out of phase in the lowermost stratosphere due to tropospheric mixing. Two models show a too strong Brewer-Dobson circulation causing an overestimation of the tracers in the lowermost stratosphere during winter and spring. One model displays a too strong tropical isolation leading to an underestimation of the tracers in the lowermost stratosphere during winter. All models suffer to some extend from diffusion and/or too strong mixing across the tropopause. In addition, the models show too weak vertical upward transport into the upper troposphere during the boreal summer. Sensitivity studies show that our initial conditions and boundary constraints are realistic and that a horizontal resolution higher than 2 degrees and an increase of the meteorology update frequency (from 6 to 3-hourly) have negligible impact on the modeled CO2 and SF6 distributions.
If you can't find a tool you're looking for, please click the link at the top of the page to "Go to old article view". Alternatively, view our Knowledge Base articles for additional help. Your feedback is important to us, so please let us know if you have comments or ideas for improvement.
 In recent years global Chemistry-Transport Models (CTMs) and more recently Chemistry-Climate Models (CCMs) have been used to study the distribution of chemical species in the UT/LS (Upper Troposphere/Lower Stratosphere) on a global scale. The UT/LS region is important with respect to the atmospheric chemical and radiative budgets, whereby the distributions of ozone [e.g., Lacis et al., 1990] and water vapor [e.g., Forster and Shine, 1997] play a key role. The chemical lifetimes of radiatively active tracers are relatively long in this region, causing transport to dominate over chemical processes, whereby the tropopause is a strong barrier to isentropic mixing that causes significant gradients in the concentration of trace species between the troposphere and the stratosphere.
 However, the transport processes in this region are complex and our understanding is still poor. It is well known that large-scale dynamical processes dominate net exchanges from troposphere to stratosphere in the tropics and from stratosphere to troposphere in the extratropics [Haynes et al., 1991; Holton et al., 1995]. However, the net exchange alone does not determine the tracer distributions in the UT/LS region. It is to a large extend affected by small-scale dynamical processes, such as convection and turbulence associated with frontal activity, which cannot be explicitly resolved by CTMs or CCMs. The main challenge for modeling transport in the UT/LS is the correct representation of the relevant prevailing dynamical processes.
 An important question is how to effectively evaluate the representation of the UT/LS in the model, given the fact that detailed trace gas observations are relatively limited. During the workshops on validation of Chemistry-Climate Models (CCMVal) this aspect has been recognized as a highlight and a challenge [Eyring et al., 2004].
 A useful and widely used vertical coordinate of the UT/LS is potential temperature. The potential temperature level of 450 K represents the upper boundary of the UT/LS [Rosenlof et al., 1997]. The lower boundary is defined as the lowest potential temperature level that does not intersect the earth surface (i.e., approximately 310 K). Generally the extratropical UT/LS include the lowest part of the stratosphere, referred to in this paper as the lowermost stratosphere (LMS). The LMS extends from the tropopause to the 380 K level, above which the stratosphere or “overworld” is located, and is characterized by transport processes on smaller spatial and temporal scales than the well-known large-scale meridional stratospheric circulation known as the Brewer-Dobson (BD) circulation. The BD circulation is the overall effect of meridional overturning (residual circulation) and horizontal mixing. The lower stratosphere contains transport barriers in the subtropics [Plumb, 1996] and at the edge of the polar vortices [Schoeberl et al., 1992] associated with large inhomogeneities in temperature.
 All of these model experiments are well suited to evaluate the model performance in the UT/LS, but they have to be carried out with more or less complex chemistry schemes and in the case of 222Rn/210PB with scale-dependent algorithms for wet scavenging and dry deposition. Moreover, by using chemical tracers an additional problem is how to separate chemistry and transport processes in the model to explain the discrepancies with the measurements.
 Another powerful tool for transport evaluation in the stratosphere is the concept of stratospheric mean age of air [Kida, 1983; Hall and Plumb, 1994], demonstrated for example in the NASA “MM2-Measurement and Models II” studies [Park et al., 1999; Hall et al., 1999]. However, it is principally not applicable in the UT and it is not valid in the LMS, where the assumption of a single entry point from troposphere into the stratosphere, i.e., the tropical tropopause, is violated.
 For all these reasons we decided to follow the approach of a model experiment made by Strahan et al.  using CO2 as a diagnostic tracer to evaluate the model transport in the UT/LS. Their study is further referred to as ST98. ST98 applied the conceptual framework of Boering et al. [1994, 1996] with the benefit that CO2 is chemically inert in the troposphere and stratosphere and thus relatively simple to implement in models. Passive tracers offer the great advantage that their distributions are only controlled by transport processes and that no chemistry is involved. ST98 further recognized the CO2 seasonal cycle and its propagation into the stratosphere as an important feature for model transport evaluation.
 We have extended and modified the ST98 study in various ways. We focus in much greater detail on the extratropical UT/LS. We use new observations with a higher temporal and spatial resolution than the ER2 data record used by ST98 in the extratropical UT and the LMS (further referred to as UT/LMS). In addition we have introduced another inert tracer, SF6. In particular the combination of both tracers CO2 and SF6 allows a unique examination of different transport pathways into the extratropical UT/LS and makes this model evaluation very powerful. Furthermore, we apply a much more direct model validation by comparing each measured data point with its temporally and spatially interpolated model counterpart. The models involved in this evaluation are all three-dimensional Chemistry-Transport Models (CTMs): TM5, TOMCAT, and SLIMCAT.
 In the next two sections we describe the principle tracer characteristics and the models. Section 4 contains the experimental setup, including a description of the initialization procedure and the boundary conditions. In section 5 we introduce the observations and section 6 discusses the results of the model evaluation. The work is finalized with the conclusions in section 7.
2. Characterization of CO2 and SF6
 The tropospheric sources and sinks of both tracers are located exclusively on the earth surface. SF6 has an atmospheric lifetime of about 3200 years [Ravishankara et al., 1993] with only anthropogenic sources in the troposphere and a photolytic sink in the mesosphere. Over the last two decades, the mixing ratio of SF6 in the troposphere has grown with a nearly constant rate of about 0.2 pptv a−1 to about 5.3 pptv on global average in January 2003. In the remote and free troposphere, the SF6 distribution exhibits no significant variability, but shows a meridional gradient due to the larger electrical power production in the northern hemisphere compared to the southern hemisphere.
 Carbon, in the form of CO2, carbonate, organic compounds, etc., is cycled between various reservoirs, such as the atmosphere, the oceans, and the marine and land biota. Similar to SF6, CO2 increases nearly linearly in the atmosphere (on the average about 1.5 ppmv a−1 over the last decades) due to anthropogenic emissions, mostly fossil fuel burning and deforestation. However, in contrast to SF6, the increase of tropospheric CO2 mixing ratios is superimposed by a seasonal cycle, mainly driven by biogenic activity. The amplitude of the CO2 seasonal cycle in the troposphere is much larger in the northern (more than ±10 ppmv in high latitudes) than in the southern hemisphere (less than ±1 ppmv in high latitudes). Even the averaged amplitude of about ±3 ppmv in the tropical lower troposphere is twice as large as the yearly growth rate. Thus, the tropospheric seasonal cycle is a dominant appearance which propagates upwards through the tropopause into the LS and spreads out merdionally, as shown by, e.g., Boering et al. [1994, 1996]; ST98 and Andrews et al. [1999, 2001a].
3. Model Descriptions
 All three models are grid point Eulerian 3D CTMs using the same offline assimilated meteorology from ECMWF (European Centre of Medium-Range Weather Forecasts), to drive the transport. The advantage of applying one meteorological data set ensures that the diagnosed differences between the models are the consequence of the representation of transport processes.
 TOMCAT and SLIMCAT only differ in the vertical coordinate and the calculation of vertical transport in the stratosphere. The comparison of both models provides insight how a different vertical coordinate formulation impacts the tracer distributions.
 The TM5 model is more similar to TOMCAT, but TM5 differs in the vertical resolution, in the transformation of the meteorological data to the model grid [Bregman et al., 2003], and in convection and planetary boundary layer dynamics.
 The global Tracer Model TM5 is an extended version of the TM3 model. The model contains a Cartesian grid with longitude and latitude as horizontal and hybrid sigma-pressure (σ-p) levels as vertical coordinates. The horizontal and vertical resolution can be freely selected, but the default horizontal resolution is 3° × 2° (longitude/latitude). The lid of the model is 0.1 hPa, corresponding to the ECWMF 60-layer vertical grid. All calculations were performed with a vertical resolution of 45-layers, derived from the 60-layers including all upper troposphere and stratosphere levels. It further uses mass flux tracer advection as described by Heimann and Keeling  and Heimann . In addition, a mass-conservative three-dimensional translation of the meteorological spectral fields to the Cartesian model grid has been applied [Segers et al., 2002; Bregman et al., 2003].
 A new approach, introducing an iterative procedure for tracer advection with locally adjusted time steps, is implemented in TM5 to solve Courant-Friedrichs-Lewy (CFL) violations [Krol et al., 2005]. The stratospheric tracer distributions were improved significantly by including iterative advection [Bregman et al., 2006]. For a detailed description of the new model version TM5 the reader is referred to Krol et al. .
 The TOMCAT CTM uses the same vertical coordinate system (σ-p) and has the same model top pressure (0.1 hPa) as TM5. In this study TOMCAT has a horizontal resolution of 5.6° × 5.6° (T21) with 24 vertical levels. The vertical resolution is ∼1.5-2 km in the lower stratosphere. The tracers are advected by conservation of second-order moments [Prather, 1986] and the convection is based on the mass flux scheme of Tiedtke . In contrast to TM5, a simplified complete mixing scheme is applied in the planetary boundary layer. The details of the applied convective boundary layer scheme are given by Wang et al.  and references within.
 SLIMCAT was developed as a stratospheric CTM with the goal to make best use of the stratospheric forcing analyses then available, i.e., those of UK Met Office [UKMO; Swinbank and O'Neill, 1994]. It differs fundamentally from both other models by the application of a hybrid sigma-theta (σ-θ) vertical coordinate system. For the hybrid σ-θ levels, the definition of the model levels change with altitude. Above a reference potential temperature level, θ0 = 350 K, SLIMCAT uses pure isentropic levels in the stratosphere (up to 3000 K) and sigma-pressure levels below. The vertical advection is from merged divergence (below 350 K) and heating rates (above 350 K). In these experiments, the net diabatic heating rates are calculated using the NCAR CCM radiation scheme [Briegleb, 1992], which gives a better representation of the vertical transport in the model [Feng et al., 2005].
 The applied horizontal resolution and number of vertical levels for SLIMCAT is the same as for TOMCAT. Also, the horizontal advection, convection and boundary layer dynamics are identical for both models. For a detailed description and discussion of the new unified SLIMCAT-TOMCAT CTM the reader is referred to the paper of Chipperfield .
4. Experimental Setup
 Both SF6 and CO2 have been simulated with the models for the time period 2000 to 2003. The results have been compared to the observations during the SPURT (Spurenstofftransport in der Tropopausenregion) project [Engel et al., 2006a].
 In addition to the model intercomparison, we performed several sensitivity experiments with TM5 applying different model configurations (see Table 1).
Table 1. Overview of the Different Models and Setups Used in This Study
 Assuming no chemical production and destruction processes of SF6 and CO2 inside the model domain, their sources and sinks are simply defined by the boundary constrains. For the surface, we followed the approach of ST98 by using observed surface concentrations. For the model top, prescribed mean age of air was used to avoid artificial tracer accumulation during the model integrations.
 In general, for CO2 and SF6 a very long spin-up time of at least 15 to 20 years would be needed to reach steady state in the stratosphere. However, we applied a realistic initial stratospheric distribution for both tracers to reduce the spin-up time. We will demonstrate that our approach allows a spin-up time of about 2 years. Since stratospheric measurements of SF6 and CO2 are limited, we used an instantaneous steady state SF6-field derived from a transient run of the middle atmosphere model KASIMA (Karlsruhe Simulation Model of the Middle Atmosphere) [Kouker, 1993; Kouker et al., 1999] for initialization. The vertical domain of KASIMA ranges from 10 km to 120 km with 63 layers. It contains a horizontal resolution of 5.6° × 5.6° and includes mesospheric SF6-chemistry. For more details of the SF6 chemistry, see Reddmann et al. .
4.1. Construction of Initial Tracer Fields
 The initialization started on the 1st of January 2000 using a KASIMA SF6 field from a 5-year repetitive integration with 1990 ECMWF analyses [Reddmann et al., 2001]. We established our initial CO2 field from the mean age of air, which was obtained from the stratospheric SF6 fields [Kida, 1983]. We used the SF6 field of KASIMA and mean age of air to construct an initialization field for the whole model domain following the procedures below.
 A linear tropospheric trend for SF6 was applied as stratospheric input function at the tropical tropopause. In this simple case, the mean age (Γ), can be easily calculated from the equation
 Here χ(r, t) is the mixing ratio at a given location r and time t in the stratosphere. χ(Ω, t) is the mixing ratio at the surface, Ω, controlling the input into the stratosphere at a given time t. dχ(Ω, ɛ)/dɛ is the slope of the linear input function on the time interval ɛ = t – t0.
 In the tropics the surface instead of the tropopause was chosen as control level, simply because of the availability of SF6 and CO2 measurements.
 The resulting mean age field (see Figure 1) is the base for the stratospheric SF6 and CO2 initialization. Consistent transformation of the mean age field to mixing ratios of SF6 and CO2 is performed by equation (2) [Hall and Plumb, 1994]
 Here G(r, t∣Ω, t′) is the stratospheric transit time distribution (TTD), also called the age spectrum. For this purpose, G is defined in a convenient way as an Inverse Gaussian Distribution (IG) in terms of the mean age Γ and the width Δ, used in many different fields [e.g., Chikara and Folks, 1989; Seshadri, 1999]:
 For the parameterization of the TTD, we apply Γ2/Δ = 0.7 as suggested by Hall and Plumb  and confirmed by Engel et al. . To prove that the initial mean age field is close to reality, a comparison is made with mean age profiles, derived from balloon-born measurements of CO2 and SF6 performed at mid- and high latitudes [Engel et al., 2002, 2006b]. The good agreement between these profiles and the model profiles derived from the initial stratospheric mean age distribution illustrates the good quality of the KASIMA tracer fields (see Figure 2).
 Tropospheric CO2 and SF6 fields were initialized from the surface up to a pressure of 300 hPa with zonal mean volume mixing ratios derived from measurements [GLOBALVIEW-CO2, 2004; NOAA/CMDL, 2004]. Between 300 hPa and 240 hPa (which is the lowest pressure level of the KASIMA model) we have applied trilinear interpolation to avoid too strong discontinuities between the tropospheric and stratospheric initialization.
 To verify the validity of our initialization method, we performed a sensitivity study with TM5 (see Table 1) to examine the impact of the initial stratospheric conditions on the tracer distribution in the UT/LS region. For this purpose all values of the initial stratospheric mean age field derived from KASIMA were varied between ±1 year, which is considerable in terms of SF6 mixing ratios. However, we found no significant differences in the troposphere and LMS after two years model integration. Only above 50 hPa the differences become non negligible, although they are below a level of ±0.4 years in November 2001 and decrease to less than ±0.2 years at the end of the model experiment in January 2003. Because of the good agreement between our initialization and observed tracer fields (see Figure 2), we conclude that our initialization does not significantly influence the LMS tracer fields for the selected integration period.
4.2. Boundary Constraints
 During the model integration the surface CO2 and SF6 are constrained by observed ground-based time series. We apply the reference boundary layer matrix (CO2-REFMBL) [Masarie and Tans, 1995] to create zonal mean input fields for CO2 on a daily base (see Figure 3). This matrix is a data product of the cooperative global data integration project GLOBALVIEW-CO2 . The surface fields for SF6 (see Figure 4), on the same grid and time resolution as CO2, were derived from interpolation of measurements at 7 remote air stations of the NOAA/CMDL flask network [NOAA/CMDL, 2004], covering a latitudinal range from 89°S to 85°N. The ground level constraints of CO2 operate as a source and a sink, due to the strong seasonal and spatial variability of the tracer. In contrast, the surface constraints of SF6 can be regarded exclusively as a source, due to the constant growth rate in the troposphere.
 We emphasize that this approach may not yield a realistic horizontal distribution of CO2 and SF6 close to the surface, but given the relatively short tropospheric mixing timescales we will demonstrate that the tracer distributions are realistic in the tropopause region.
 We explored the possibility to use reliable and realistic 2D surface constraints instead of zonal means. This might be achievable for SF6, of which the surface emissions are closely tied to energy-related human activity and thus fairly well known, but not for CO2, due to the coupling with the biosphere and the oceans. Using a detailed CO2 emission scenario is ineffective given other relevant uncertainties. For example, Bian et al.  indicate that convective transport algorithms have similar magnitude of uncertainty as different CO2 emissions scenarios. They conclude that the balances between different processes (in this case convective transport and emissions) can obscure the physical nature of relationships within the system.
 CO2 and SF6 were fixed at the top of the model on basis of the mean age of air fields derived from the KASIMA model. As for the initialization, the mean age was converted to mixing ratios using equations (2) and (3), based on the constrained time series in the tropical troposphere. The resulting top level constrains are therefore consistent with the ground level constrains and they are on the same daily base.
 We performed two integrations with TM5 including and excluding prescribed mean age at the model top. The differences in the tracer fields were negligible at pressure levels higher than 30 hPa.
 The observations used for our model evaluation are obtained from the SPURT project, which was part of the German AFO 2000 program. High quality measurements were performed for a number of tracers with different chemical lifetimes in the UT/LMS region covering a latitudinal range between 30°N and 80°N over Europe. Every season was probed twice during intensive campaigns for a period of 2 years. A detailed overview of the SPURT results, including technical details, is given by Engel et al. [2006a] and references within.
 Due to instrumental failures SF6 was occasionally missing. For those periods, we derived SF6 from N2O observations based on observed solid linear relationships between N2O and SF6. As an example Figure 5 shows the observed linear N2O/SF6 relationship in the UT/LMS performed in May 2002. The relationships are truly linear, given that the standard deviations from measured SF6 and those from N2O-derived SF6 are equal to the statistical error given by the precision of both instruments. This approach has the additional advantage that the N2O measurements have a much higher time resolution and a slightly better precision than the SF6 measurements.
6. Model Evaluation
 We first discuss the monthly mean latitudinal cross sections for SF6 and CO2 calculated with TM5. Next we demonstrate how the SPURT observations of these two passive tracers can be used to investigate the representation of transport processes in the UT/LS by models, i.e., TM5, TOMCAT and SLIMCAT (section 6.2).
 The model evaluation was performed on a point-to-point basis along the flight track. That means that we only use the model data at the same time and locations as the measurements. To archive this, instantaneous 3-hourly model tracer fields were interpolated in time and space applying the Modified Shepard method [Renka, 1988].
6.1. Zonal Mean Distributions With TM5
 The zonal mean distributions illustrate the main characteristics of the tracers' distribution and the temporal evolution. Figures 6 and 7show typical seasonal variations of the monthly mean latitudinal cross sections for SF6 and CO2 calculated by TM5 with the default (3 × 2, see Table 1) model setup. We have added a representation of the tropopause by introducing an artificial tracer T500, which is set to zero (mass mixing ratio) for pressure higher than 500 hPa and set to unity at potential temperature (θ) levels higher than 380 K. Hence, T500 is a proxy for the amount of stratospheric air. The T500 levels shown in Figures 6 and 7 are the levels 0.2, 0.4 and 0.6. These levels follow the observed tropopause remarkably well (see Figure 11) and thus give a realistic representation of the tropopause.
 The nearly linear increase in time of the tropospheric SF6 mixing ratios and the meridional gradient can clearly be identified in the lower troposphere (see Figure 6). Because of the long chemical lifetimes, the stratospheric tracer distributions are determined by the BD circulation, with upwelling in the tropics and downwelling in the extratropics, and horizontal mixing. Low values of the SF6 mixing ratio indicate old air due to long stratospheric transport times. The lowest mixing ratios in the stratosphere are located at high latitudes in the winter hemisphere polar vortex, where downward transport is most intense and the air is isolated from lower latitudes and the edge of the polar vortex. This isolation is represented by steeply sloped isopleths indicating a horizontal transport barrier. A second transport barrier visible in Figure 6 represents the edge of the subtropics.
 Note that SF6 increases in the stratosphere over the whole integration period. This is the consequence of the nearly constant linear growth of the tracer in the troposphere over the last two decades.
 In contrast to SF6 tropospheric CO2 is dominated by a seasonal cycle, which is driven by biogenic activity. The CO2 seasonal cycle is propagated vertically in the tropical troposphere into the UT/LS and horizontally to midlatitude, most clearly visible in boreal summer (bottom left panel of Figure 7). The signal is large enough to cause a vertical counter gradient in the midlatitudes during this season. The presence of a natural “pulse” in combination with the vertical and horizontal propagation of this pulse demonstrates both the elegance of CO2 as a validation tracer and the complexity of transport in this region. It is a challenge for a global model to represent such transport features correctly. Figure 7 illustrates that TM5 is able to represent it qualitatively. Later, we will examine the performances of all models in more detail.
Figure 7 also shows that the amplitude of the CO2 seasonal cycle is damped during the transport into the stratosphere due to ongoing mixing processes. In the middle and upper stratosphere, the propagated seasonal cycle is smoothed out.
6.2. Model-Measurement Intercomparison
 For the model evaluation we used the default setup of TM5, but reduced the horizontal resolution to 6° × 4° (TM5_6 × 4) to be consistent with the applied horizontal grid resolution of TOMCAT and SLIMCAT. Later we will demonstrate that a decrease in the horizontal resolution from 3° × 2° to 6° × 4° has negligible impact on the modeled distributions of SF6 and CO2 in the UT/LS.
6.2.1. Time Series
 A large number of flights have been performed during SPURT. As an illustrative example, Figure 8 shows the comparison of the TM5, TOMCAT and SLIMCAT results with the observations from four flights, each covering a different season. For diagnostic reasons, the figure contains the vertical coordinate Δθ (K), indicating the distance to the local tropopause, defined by a value of 2 PVU (Potential Vorticity Units, 1 PVU = 10−6 K m2 kg−1 s−1). It has been demonstrated that Δθ is a useful transport diagnostic [e.g., Hoor et al., 2004]. The parameter was deduced from ECMWF analysis by calculating the difference between θ (potential temperature at the position of the aircraft along the flight path) and θTP (potential temperature at the tropopause).
 The models are not able to simulate the observed small-scale variability and the sharp gradients of the observations, especially when the aircraft crossed different air masses horizontally, e.g., the flight on 23 August 2002. This is obviously a consequence of the limited model resolution compared to the measurements, which have a much higher vertical (∼10–100 m) and horizontal (∼1–2 km) resolution.
 Nevertheless, the model tracer fields follow the observations quite well and occasionally even very good, e.g., modeled CO2 for the flight on 11 November 2001. Note that the tropospheric mixing ratios of both tracers and their vertical profiles at the beginning and at the end of each flight are well reproduced by the models. However, some model deviations are noticeable. TM5 and TOMCAT overestimate the lowest SF6 and CO2 mixing ratios in the LMS, most prominent during the flight in May 2002. Further, SLIMCAT significantly underestimates SF6 and CO2 in the LMS during the flight on 19 January. In contrast, for the flight on 17 May both TOMCAT and TM5 overestimate both tracers, while SLIMCAT shows better agreement in the LMS. Another interesting feature of this flight is the good performance of TM5 during the first half of this flight compared to the other models. A similar, but much weaker feature is also seen for the flight on 19 January. Below we will discuss the model performances in the different seasons in more detail.
6.2.2. Vertical Profiles
Figure 9 shows the modeled and measured SF6 and CO2 values from all SPURT campaigns relative to the distance to the local tropopause. The profiles are binned in Δθ-intervals with a width of 5 K. The median instead of the average was used to calculate the values for each bin to minimize the influence of spurious outliers of the measurements. The error bars, for clarity shown for TM5 only, indicate the simulated minimum and maximum values.
 In the troposphere, the observed and simulated SF6 mixing ratios agree very well for all models, indicating realistic surface constraints. Relatively low values of SF6 are observed in the LMS during and after the phase of strongest stratospheric downward transport in winter and spring, whereas higher values are found during and after the phase of weakest downward transport in summer and autumn.
 In the LMS, TM5 and TOMCAT underestimate the vertical SF6 gradients. Especially in winter and spring, modeled SF6 decreases too slowly above Δθ = 20 K. This finding is consistent with the SF6 overestimation in the LMS shown in Figure 8 for 19 January and 17 May for both models. SLIMCAT underestimates SF6 in the LMS in January, but shows an excellent agreement in May. In general, SLIMCAT provides the lowest SF6 values of all models in the LMS. In summer and autumn TM5 and SLIMCAT reproduce the observed SF6 profiles very well in the LMS, while TOMCAT shows too high concentrations.
 CO2 contains a more complex vertical distribution in the UT/LS, which is most obvious in the observed change in the vertical gradient between May (decrease with altitude) and August (increase with altitude). The mixing ratios in the troposphere are dominated by the seasonal cycle, with lowest values in August and highest values in May (see also Figures 3, 6 and 7). The LMS in August contains tropospheric remnants from spring with high CO2 values due to the propagation of the seasonal cycle.
 Strong convection during summer and early autumn is responsible for the high tracer variability observed in the upper troposphere in August and October. In general, CO2 exhibits much more variability in the free troposphere (above the planetary boundary layer) than SF6 during these seasons, because of the diurnal changes in the biogenic activity.
 We can benefit from the availability of both SF6 and CO2 to diagnose the results in terms of different transport pathways in the LMS, namely stratosphere-troposphere exchange or “fast” quasi-horizontal (isentropic) mixing versus “slow” downwelling by the BD circulation. Too strong cross-tropopause or horizontal mixing in the models would always yield too high calculated SF6 mixing ratios in the LMS (the opposite for too weak transport). However, the same transport deviation would result in either too low or too high CO2 values, due to the dominant seasonal cycle. This would lead to seasonally varying differences with observations.
 In contrast, too strong BD circulation would always result in too low CO2 and SF6, because the CO2 seasonal cycle is smoothed out for air coming from the overworld.
 However, one has to keep in mind that the midlatitude LMS is always subject to mixing with tropospheric air, which complicates the distinction between different transport pathways.
 In January SLIMCAT significantly underestimates both SF6 and CO2 in the LMS. The good agreement of modeled and observed mean age of air at the 50 hPa level [Monge-Sanz et al., 2007] derived from SLIMCAT and ER2 observations [Andrews et al., 2001b], both calculated from CO2 mixing ratios, rules out that too weak BD circulation is the reason for the underestimated SF6 and CO2 mixing ratios in January. For this reason, the explanation must be the consequence of processes that occur between the upper boundary of the LS (450 K), approximately the level of the ER2 observations, and the upper boundary of the LMS (380 K), the upper level of the SPURT observations. This region coincides with the “tropical controlled transition layer” [Rosenlof et al., 1997], where the isolation of the extratropics from the tropics by the subtropical transport barrier is weakest in the stratosphere. The most likely explanation for these discrepancies in the LMS found in January is too strong isolation between these reservoirs in the tropical controlled transition layer during the period of autumn to winter in SLIMCAT.
 In May, all models overestimate CO2 in the LMS, while only SLIMCAT agrees well with the observed SF6. Hence, it can be ruled out that a too fast stratospheric overturning caused the overestimated CO2 values in the LMS and horizontal mixing (across the tropopause) must have played a role. In fact, too strong cross-tropopause transport of tropospheric air leads to an over proportional increase of CO2 relative to SF6 in the LMS, because the tropospheric CO2 seasonal cycle has its maximum during this period of the year in the northern hemisphere. The fact that TOMCAT and TM5 still overestimate SF6 is most likely due to the different vertical coordinate definition of both models compared to SLIMCAT, leading to different mixing intensity.
 The overestimated SF6 and underestimated CO2 found for TOMCAT in the LMS above Δθ = 20 K during October must be the consequence of overestimated cross-tropopause mixing during summer and autumn (low tropospheric CO2 values in NH) in the model.
6.2.3. Propagation of the Tropospheric CO2 Seasonal Cycle
 A key step in this model evaluation that highlights most of the findings discussed above is to examine if the model propagates the tropospheric CO2 seasonal cycle into the LMS accurately. For this purpose, we compare the observed and simulated CO2 seasonal cycle on different Δθ-intervals in Figures 10a–10d. Δθ has the same definition as in Figure 9. Negative Δθ levels indicate the troposphere and positive levels indicate the stratosphere. The dotted lines in all figures are derived from the observed reference marine boundary layer matrix, averaged over the latitude range from 35°N to 65°N, which is also applied for the surface constraints (see section 4.2).
 The amplitude of the seasonal cycle has its maximum at low altitudes and at Δθ = -40 K (in the middle troposphere). Note that the amplitude is close to the tropospheric boundary values. The observed amplitudes in the UT [-20 K < Δθ < 0 K] and just above the tropopause [0 K< Δθ < 20 K] in the so-called “tropopause following transition layer” [Hoor et al., 2004] are damped, but their cycles are still in phase with the lower troposphere. The observed CO2 cycle above Δθ = 20 K is shifted by about 3 month.
18.104.22.168. Middle and Upper Troposphere
 In the middle and upper troposphere the observed CO2 seasonal cycle is reproduced fairly well by all models (see Figures 10b–10d) with the exception of August, when the seasonal minimum is underestimated. The observations suggest a strongly mixed troposphere up to the tropopause, most likely due to convection, which maximizes during summertime. Apparently the models underestimate the convective events. This is consistent with Olivie´ et al. , who showed that TM5 convective fluxes were somewhat too weak. However, there is good qualitative agreement with the observations during the rest of the year.
22.214.171.124. Lowermost Stratosphere
 It is interesting that the behavior of TM5 in the range of 0 K to 20 K is different from both other models. At this level, the TM5 seasonal cycle is out of phase with the troposphere and follows instead the phase shifted cycle of the adjacent layer above. This means that the transport times from the troposphere into the tropopause following transition layer are longer in TM5, and thus the layer is, in contrast to observation, more decoupled from the troposphere.
 In the LMS above the tropopause layer, represented by the two layers above Δθ = 20 K, TM5 matches the observed CO2 seasonal cycles, the phase and its shift relative to the tropospheric cycle. The model slightly underestimates the observed CO2 values in May, consistent with Figure 9. SLIMCAT and TOMCAT do not reproduce the seasonality of CO2 in these layers accurately. The calculated phase of the CO2 seasonal cycle in the LMS in both models follows the phase in the troposphere during the first half of the year. The tight connection between the troposphere and LMS demonstrates that the transport times from troposphere into the LMS are too short in both models.
6.3. Sensitivity Studies
6.3.1. Sensitivity to Different Advection Schemes
 Next to second-order moments we have applied the more diffusive first-order moments only in our advection scheme (a so-called “slopes” scheme) to examine the sensitivity of the results toward the diffusivity of the advections scheme. Figure 11 shows the comparison of TM5 simulations with second-order moments (TM5_3 × 2) and slope advection (TM5_3 × 2_slope) with observations from two SPURT flights performed in winter and spring. Figure 11 also includes the comparison between Δθ and the artificial tracer T500. It is interesting to see the large similarity of T500 with Δθ that demonstrates the usefulness of this tracer to determinate the location of the measurements relative to the tropopause.
 The first-order “slope” advection scheme is too diffusive, yielding an overestimation of the CO2 mixing ratios of about 2 ppmv in winter to 4 ppmv in spring, while there are no significant differences for SF6. Apparently, the SF6 gradients are small enough that even the diffusive slope scheme is able to produce reasonable results, whereas CO2 can not be simulated correctly. This examination illustrates that model transport evaluations are vulnerable to the prevailing tracer spatial gradients. CO2 is a sensitive tracer, since the seasonal perturbations create sufficient spatial gradients for a useful evaluation of the advection scheme. Furthermore, the comparisons of both T500 time series reveal, that the calculated amount of stratospheric air in the LMS is always less for slope than for second order moment advection. This might point toward too strong cross-tropopause transport when first-order advection is applied.
6.3.2. Sensitivity to Different Horizontal Resolution and Meteorology Update Frequency
 In previous studies it was shown that increasing the update frequency of the applied meteorology fields from 6 h to 3 h significantly improved the modeled stratospheric tracer fields [Legras et al., 2005; Berthet et al., 2006; Bregman et al., 2006]. Here we investigate the impact for the UT/LS for the integration period considered. We also examined the effect of horizontal resolution by comparing the results of TM5_6 × 4, TM5_3 × 2 and TM5_3 × 2_3h simulations with SPURT observations.
 Surprisingly, neither the coarser horizontal resolution nor the higher meteorology update frequency has a notable impact on the modeled vertical distribution of both tracers in the UT/LS. The latter finding seems in contradiction with Legras et al. , who found that the use of 3-hourly meteorology leads to much less diffusion, and thus to a better representation of small-scale variability. However, small-scale diffusion has a minor impact on the general CO2 and SF6 distributions in the UT/LS. Higher up in the stratosphere, the use of 6- and 3-hourly assimilated winds do produce different CO2 and SF6 distributions (not shown) in line with the findings in other studies.
 The sensitivity to horizontal resolution depends on the applied transport scheme. More diffusive advection schemes than the one of Prather  do benefit from higher horizontal resolution [Strahan and Polansky, 2006]. In terms of tracer transport, more diffusivity can be seen as lower grid resolution. Our finding suggests a horizontal resolution threshold close to 1 to 2 degrees, which is close to the “effective” default resolution in TM5 when second-moment advection is used.
 In this study we present a detailed evaluation of extratropical UT/LS transport in the three different global 3D-offline CTMs, TM5, TOMCAT and SLIMCAT, using unique high-resolution airborne in situ observations of CO2 and SF6 derived from the SPURT project. This data set is the first that allows such a very detailed model evaluation in this region. A point-to-point comparison with observations has been made for every season and the propagation of the CO2 seasonal cycle is examined.
 For this model experiment we have developed a relatively simple setup that is easy to implement. Sensitivity runs showed that the boundary constraints are sufficiently realistic to simulate CO2 and SF6 distributions in the UT/LS. There is some bias in tropospheric CO2 by the zonal mean surface constraints, likely caused by uncertainties in the convection parameterization.
 The models yield quite reasonable agreement and capture the general seasonal varying features of both tracers. The models are also able to represent a vertical counter gradient of CO2, which is caused by vertical and horizontal propagation of the tropospheric seasonal cycle. Nevertheless, the comparison also reveals seasonal varying deviations in the modeled LMS tracer distributions. TM5 and TOMCAT overestimate CO2 and SF6 in the LMS during winter and spring. SLIMCAT underestimates CO2 and SF6 in the LMS in winter, due too strong isolation between tropics and extratropics for the potential temperature range of 380 K to 450 K.
 All models suffer to some extend from enhanced cross-tropopause mixing, leading to less sharp vertical gradients than observed and time lags in the CO2 seasonal cycle in the LMS. TM5 deviations are somewhat less, most likely due to its higher vertical resolution.
 The sensitivity studies carried out with TM5 give some interesting results. First, the results are insensitive to horizontal resolution, when increasing the resolution from 6 × 4° to 3° × 2°. Although this seems to conflict the results from other model studies, we suggest the presence of a grid resolution threshold of about 1-2 degrees where long-lived tracer profiles become less vulnerable to resolution changes.
 Furthermore, the use of 3-hourly instead of 6-hourly meteorology does not affect the simulated passive tracer distributions in the UT/LS. Apparently, the small-scale variability does not affect average UT/LS SF6 and CO2 distributions. Higher in the stratosphere the use of 3-hourly data significantly improves the tracer distribution in line with other studies.
 We have demonstrated the usefulness of the combination of SF6 and CO2 for transport diagnostics. It is relatively easy to implement, and sensitive to diffusion and convection parameterizations. We encourage other global modelers to join this evaluation. The boundary data including a description how to use then and an example of the TM5 algorithm is available and can be supplied upon request. New upcoming tracer observations may help improving our boundary constraints further.
 This work is partly funded by the project SCOUT-O3 - Stratospheric-Climate Links with Emphasis on the UTLS under EC contract GOCE-CT-2004-50539, and the project SPURT under the AFO 2000 program of the German Ministry for Education and Research (BMBF). The authors acknowledge the Halocarbons and other Atmospheric Trace Species Group (HATS) from NOAA/ESRL Global Monitoring Division for the SF6 data set used to force the lower boundary of the model and Thomas Reddmann for the KASIMA SF6 data. We thank Horst Fischer and the anonymous reviewers for their useful comments and Arjo Segers for the great computational support. Finally we would like to thank the enviscope GmbH (Frankfurt a. M., Germany) and the Gesellschaft für Flugzieldarstellung (GFD) for the excellent co-operation and support during the aircraft campaigns.