The performance of 18 coupled Chemistry Climate Models (CCMs) in the Tropical Tropopause Layer (TTL) is evaluated using qualitative and quantitative diagnostics. Trends in tropopause quantities in the tropics and the extratropical Upper Troposphere and Lower Stratosphere (UTLS) are analyzed. A quantitative grading methodology for evaluating CCMs is extended to include variability and used to develop four different grades for tropical tropopause temperature and pressure, water vapor and ozone. Four of the 18 models and the multi-model mean meet quantitative and qualitative standards for reproducing key processes in the TTL. Several diagnostics are performed on a subset of the models analyzing the Tropopause Inversion Layer (TIL), Lagrangian cold point and TTL transit time. Historical decreases in tropical tropopause pressure and decreases in water vapor are simulated, lending confidence to future projections. The models simulate continued decreases in tropopause pressure in the 21st century, along with ∼1K increases per century in cold point tropopause temperature and 0.5–1 ppmv per century increases in water vapor above the tropical tropopause. TTL water vapor increases below the cold point. In two models, these trends are associated with 35% increases in TTL cloud fraction. These changes indicate significant perturbations to TTL processes, specifically to deep convective heating and humidity transport. Ozone in the extratropical lowermost stratosphere has significant and hemispheric asymmetric trends. O3 is projected to increase by nearly 30% due to ozone recovery in the Southern Hemisphere (SH) and due to enhancements in the stratospheric circulation. These UTLS ozone trends may have significant effects in the TTL and the troposphere.
If you can't find a tool you're looking for, please click the link at the top of the page to "Go to old article view". Alternatively, view our Knowledge Base articles for additional help. Your feedback is important to us, so please let us know if you have comments or ideas for improvement.
 The upper troposphere/lower stratosphere (UTLS) plays a key role in radiative forcing of the climate system and chemistry-climate coupling (see Shepherd  for a recent review). The tropical tropopause layer (TTL) sets the boundary condition for air entering the stratosphere [Brewer, 1949]. Since the tropical tropopause is itself not a transport barrier, it has come to be thought of as a layer of finite depth. We here regard the TTL as being synonymous with the tropical UTLS for the purpose of model validation. The TTL is the region in the tropics within which air has characteristics of both the troposphere and the stratosphere. Representing the TTL region accurately in global models is critical for being able to simulate the future of the TTL and the effects of TTL processes on climate and chemistry.
 The TTL is the layer in the tropics between the level of main convective outflow and the cold point tropopause (CPT), about 12–19 km [Gettelman and Forster, 2002]. The TTL has also been defined by Fueglistaler et al.  as a shallower layer between the level of zero clear sky radiative heating and the CPT (15–19 km). We will use the deeper definition of the TTL here because we seek to understand not only the stratosphere, but the tropospheric processes that contribute to TTL structure (see below). The TTL is maintained by the interaction of convective transport, convectively generated waves, radiation, cloud microphysics and the large-scale stratospheric circulation. The TTL is the source region for most air entering the stratosphere, and therefore the TTL sets the chemical boundary conditions of the stratosphere. Clouds in the TTL, both thin cirrus clouds and convective anvils, have a significant impact on the radiation balance and hence tropospheric climate [Corti et al., 2006].
 In this study we present quantitative evaluations of coupled Chemistry Climate Models (CCMs) in the TTL. We also present key historical trends in the TTL for model evaluation, and key future projections in the TTL and the extratropical lowermost stratosphere (LMS) that may affect the TTL by rapid quasi-isentropic transport. This study builds on earlier work by Gettelman and Birner , who analyzed 2 models and Gettelman et al. , who analyzed trends for 11 CCMs. Here we extend these works by performing a more quantitative set of model diagnostics using 18 updated models and analyze trends for the future. These CCMs were run for the CCM Validation 2 (CCMVal-2) project experiments as input to the 2010 World Meteorological Organization (WMO)/United Nations Environment Programme (UNEP) assessment of stratospheric ozone depletion. A companion paper on the extratropical UTLS by Hegglin et al.  also includes an assessment of model performance.
 The TTL is the source of most stratospheric air, and water vapor in the stratosphere is regulated by tropopause temperatures [Brewer, 1949]. Hence the correct representation of the TTL critically depends on a correct representation of tropical tropopause temperature and water vapor. Diagnostics will also focus on variability in the TTL, for examining large scale and long-term variability in tropopause temperature. The different diagnostics are used to grade model skill. Quantitative grades are applied to some of the diagnostics. These quantitative diagnostics can be used as metrics of model performance.
2.1. Models and Experiments
 The models and simulations used in this study are part of the CCM Validation round 2 (CCMVal-2) inter-comparison project. All of the models are coupled CCMs. A CCM is a General Circulation Model (GCM) of the atmosphere that includes prognostic chemical species that are used in the dynamics and thermodynamic equations of the model. Most importantly, chemically active ozone and water vapor are used in the GCM radiative heating equation. CCMVal-1 models have been documented by Eyring et al.  and results reported by World Meteorological Organization . The performance of these models in the TTL has been examined by Gettelman et al. . Here we perform quantitative analyses on a new set of models. The list of models and basic references are presented in Table 1.
Table 1. Description of Models Used in This Studya
Horizontal resolution (Horiz. Res.) is in degrees of latitude (longitudes are 20–50% larger), and truncation is in parentheses if the model is not on a latitude-longitude grid. TTL levels (T for triangular, R for rhomboidal) are the number of levels between 300 and 100 hPa.
 Further information on the attributes of each model is available in the references in Table 1, or from Morgenstern et al. , a comprehensive description of the models. Salient features of the models are noted here. CMAM is coupled to an ocean model, while the other models use specified Sea-Surface Temperatures (from observations or another coupled model run for the future). Many of the models share a common heritage. E39CA, EMAC and (NIWA-) SOCOL are all based on the European Center Hamburg (ECHAM) GCM. UMETRAC, UMSLIMCAT and UMUKCA models are based on the Unified Model (UM). However, UMUKCA and EMAC are based on newer versions of their respective model. WACCM and CAM3.5 share the heritage of the NCAR Community Atmosphere Model version 3.5. All models have an inorganic chemistry scheme including chlorine and bromine (except for E39CA) chemistry. Only three models (CAM3.5, EMAC and ULAQ) have a comprehensive description of tropospheric chemistry. As indicated in Table 1, most models have 6–9 layers in the UTLS, corresponding to a vertical resolution of about 1 km. EMAC and E39CA have higher vertical resolution in this region (12 and 15 levels). ULAQ and SOCOL have lower vertical resolution (3–5 levels). For most models the horizontal resolution is ∼200–300 km. ULAQ is significantly lower than this. The CCMVal-2 models include a larger set than CCMVal-1 (14 v. 11 models) and there are now 13 models with simulations to 2100 (v. 2 models in CCMVal-1). More importantly, there are 4 new models, and one discontinued. There are numerous changes to each model [see Morgenstern et al., 2010], and these points are discussed as they are relevant for the results.
 Model simulations analyzed comprise two types of runs, as specified by Eyring et al. . The first are ‘historical runs’ from 1960–2005, with specified boundary conditions for the sea surface temperature (SST), and specified concentrations of greenhouse gases and halogens, known as ‘REF-B1’. Runs for the future from 1960–2100 are called ‘REF-B2’ and use emissions scenarios and SST fields as discussed by Eyring et al. .
2.2. Quantitative Diagnostics
 The list of diagnostics used in this study is shown in Table 2 and described in more detail below (and in each section). Diagnostics 1–4 have quantitative grades applied. Table 2 also indicates the data source(s) used for evaluation and grading. Some diagnostics (especially 6 and 7) required special outputs, often instantaneous output, and were not performed for all models. Monthly mean output is supplied on CCMVal-2 levels (see Figure 5).
Monthly means are used for analysis, except for instantaneous data noted by a superscript ‘i’ in the table. Monthly means are on CCMVal-2 standard levels (shown in Figure 5) and instantaneous data is on model levels. Data sets are described in more detail in the text.
Ti, Ui, Vi
2.2.1. Diagnostic 1: Temperature of the Cold Point Tropopause
 It is critical that models reproduce the amplitude and phase of the annual cycle of temperature of the cold point tropopause (TCPT) as this regulates water vapor and total hydrogen in the stratosphere. Because of the non-linearity of the Clausius-Clapeyron equation regulating water vapor saturation vapor mixing ratios, the annual cycle is more important than the mean value over the year. This is a simplified diagnostic of the true ‘Lagrangian Cold Point’ which we can examine in only a few models and which is not quantitative (see below). One measure of uncertainty is the grading of re-analysis systems compared to each other (ideally all ‘observations’ should have a perfect grade of 1), which gives a sense of the variation between analysis models.
2.2.2. Diagnostic 2: Tropopause Pressure
 The pressure of the lapse rate tropopause (PTP) provides a basic measure of whether the tropopause is in the right location and how it varies over the annual cycle and response to inter-annual forcing. Responses to major forced events (ENSO and volcanoes are included in historical runs) should resemble observations. Anomalies of lapse rate tropopause pressure have been shown to be more robust than TCPT in observations and models [Gettelman et al., 2009]. Simulated PTP anomalies can be compared to re-analysis systems. As described below, the grading for this diagnostic includes the correlation with inter-annual anomalies and the mean values from re-analysis systems in similar coordinates.
2.2.3. Diagnostic 3: Water Vapor Above the Cold Point Tropopause
 In conjunction with TCPT, the water vapor concentration above the cold point tropopause (CPT) at 80 hPa is the dominant term in the total hydrogen budget of the stratosphere. This budget is important for radiation and chemistry (for example, Polar Stratospheric Cloud formation). Models should simulate appropriately the water vapor concentration in the lower tropical stratosphere, and its annual cycle.
2.2.4. Diagnostic 4: Ozone in the TTL
 TTL ozone is affected by both transport and chemistry. TTL ozone is an important indicator of TTL processes, as well as another baseline indicator of the entry of air into the lower stratosphere. It can be a proxy for the entry of short lived species into the stratosphere (for which we do not have sufficient observations for CCM validation). Models should represent the vertical structure of ozone and its annual cycle. Ozone is also radiatively important in the TTL, and thus critical for a correct representation of the TTL thermal structure. Since ozone is chemically produced in the TTL by various processes, it is also an integrated measure of TTL chemistry processes and TTL transport time. Differences in ozone may be due to different chemical processes (for example NOx production by lightning), which may or may not be present in a given model.
 The following diagnostics do not include quantitative grades but provide a more detailed process-level view of model solutions. In most cases they required more detailed output than provided by most models, but they provide more insight into TTL processes.
2.2.5. Diagnostic 5: Correlations Between 80 hPa H2O Mixing Ratio and TCPT
 H2O at 80 hPa and TCPT can be compared by translating TCPT into water vapor using the saturation vapor mixing ratio (QSAT), a function of temperature and pressure. There should be a correlation between 80 hPa H2O and TCPT. This can also be expressed as the saturation vapor mixing ratio of the TCPT (QSAT(TCPT)) and the ratio H2O/QSAT(TCPT) should reflect the integral of physical mixing processes and dehydration.
2.2.6. Diagnostic 6: Tropopause Inversion Layer
 The Tropopause Inversion Layer (TIL) is a layer of increased static stability that occurs just above the tropopause [Birner, 2006]. The TIL provides an integrated look at the dynamical structure of the TTL in the vertical. It not only shows the separation between the stratosphere and troposphere, but also provides insights into the correct dynamical results of convection in the upper troposphere, and transport and dynamics in the lower stratosphere. The static stability structure is sensitive to the radiative balance of the TTL, and hence transport of H2O and O3, as well as large-scale dynamics.
2.2.7. Diagnostic 7: TTL Transport Pathways and Residence Time
 The transport time through the TTL is a complex diagnostic reflecting a mix of transport processes, including large-scale advection and mixing, as well as rapid convective motion in the vertical. Representing the transport time and pathways through the TTL is critically important for calculating the minimum temperature experienced by a parcel (which regulates water vapor). It is possible to alter stratospheric water vapor by changing transport pathways but not changing the mean temperature. Transport time is also critical for short lived species, whose lifetimes are less than a small multiple of the transport time. Several studies have attempted to assess the transport time, and here we will use Lagrangian trajectory studies to estimate transport times from a subset of models and compare them to observations.
 Grades are used to obtain quantitative information on model behavior for some diagnostics. Mean values of a certain quantity or the amplitude and phase of a seasonal cycle can be used as a grade. Here, quantitative grades are defined following Douglass et al.  and Waugh and Eyring , with extensions to look at variability. Grades are based on defining monthly means after spatial averaging. Douglass et al.  define a grade based on monthly mean differences:
Here, μi is a monthly mean quantity for month i from either a model (mod) or observations (obs) and n = 12. ng a scaling factor representing a number of standard deviations (σ). σi is calculated for each month (i). If a model is more than ng standard deviations from the observations, then gm = 0. We set ng = 3 (3σ threshold) for temperature and water vapor following Waugh and Eyring . Because tropopause pressure is estimated from a set of coarse resolution standard levels, variability in the observations (also interpolated to these levels) is very low. So we set the 3σ threshold (ngσobs) in Equation 1 to 10 hPa for tropopause pressure (reflecting an uncertainty of one CCMVal-2 level).
 We also define a grade based on correlated variability where μ′ are anomalies from a mean quantity and is the linear correlation coefficient.
For analysis here the correlation is taken on annual mean values, and thus reflects correlations of inter-annual variability between a model and observations.
 We can also define a diagnostic based on the magnitude of the monthly variance of a quantity:
Where σ is calculated each month (i) and n = 12.
 A single grade is then the linear combination: Gsum = (gm + gc + gv)/3. The composite grade is designed to better represent uncertainty and forced variability. This partly (but not completely or rigorously) addresses shortcomings in the application of grades recently identified by Grewe and Sausen .
 We have evaluated grades using several different measures of σobs and μobs from different reanalysis systems or estimated from σobs and μobs estimated from an ensemble of re-analysis systems. While the quantitative grades do change, the relative grades between models and the spread are robust across the different methods examined. For clarity, we will report grades against one set of observations, and grade other observational data sets against that in each quantitative model summary figure to estimate the spread in grades from the observations. We also examine the multi-model mean, calculated by summing model outputs to generate a multi-model μmod. Quantitative grades for individual components are reported. The goal of applying grades is to quantitatively determine model deficiencies with sufficient detail to understand where and why models perform or do not perform well.
3. Observations and Analyses
 High quality measurements in the TTL and the global UTLS for the use of model validation are challenging to obtain. In-situ instruments on balloons or aircraft are challenged by the low pressure and low temperature conditions. Remote sensing techniques used to observe the stratosphere are challenged by saturation of the measured radiances in the UTLS in many commonly used wavelengths. Additional difficulties arise from the small vertical and horizontal length scales found in the chemical and dynamical fields in the UTLS – the result of the large dynamical variability in the tropopause region. Here an overview is given of the observational data sets used for the model-measurement comparisons in the UTLS in order to provide critical information about their accuracy, precision, and potential sampling issues.
3.1. Balloon Data
 A variety of balloon data sources are available and used in these analyses. The global radiosonde network provides a comprehensive view of the thermal structure of the UTLS. High vertical resolution radiosondes have provided a wealth of information about the TTL structure. However, inhomogeneities in radiosonde records over time often make use of raw records problematic for trend analysis, and care must be taken when trends are analyzed [Seidel and Randel, 2006].
3.2. Satellite Data: HALOE
 Recently, satellite instruments have achieved the technological maturity to remotely sound the UTLS from space, offering an unprecedented temporal and spatial coverage of this region. Here we use water vapor observations from the Halogen Occultation Experiment (HALOE) on the UARS satellite [Russell et al., 1993]. HALOE H2O observations have been extensively validated [e.g., SPARC, 2000]. HALOE validation and a 13 year record (1992–2004) gives us high confidence in HALOE performance. More recent satellite measurements have not been thoroughly validated in the UTLS.
3.3. NIWA Ozone Data Set
 For comparisons of simulated ozone, we use the National Institute for Water and Atmosphere (NIWA) Ozone data set described by Hassler et al. . The data set is a 4D reconstruction (latitude, longitude, altitude and time) using satellite and ozonesonde measurements. The current version as noted by Hassler et al.  does not correct for known data artifacts, and may not be suitable for trends. Here we use the data base for climatological comparisons.
3.4. Meteorological Analyses
 Operational meteorological analyses are produced on a daily basis by weather forecast centers. These analyses (or ‘reanalyses’ if they are produced by consistent forecast models over time) are very valuable for model comparison, since they provide complete fields that are closely tied to observations, but with similar space scales and statistics as global models. Here we use analyses from the National Centers for Environmental Prediction and National Center for Atmospheric Research (NCEP) described by Kalnay et al. , the NCEP and Department of Energy (NCEP2) described by Kanamitsu et al. , the Japanese Re-Analysis (JRA) described by Onogi et al. , the European Centre for Medium Range Weather Forecasts (ECMWF) 40 year re-analysis (ERA40) described by Uppala et al.  and ‘Interim’ analysis (ERAI) described by Uppala et al. . For information on the different reanalyses (ERA40, NCEP, JRA) the reader is referred to Randel et al.  and their references. A few distinct caveats common to reanalyses have to be noted. Because of the inhomogeneity of input data, specifically the introduction of significant assimilation of satellite observations starting in the late 1970's, estimating trends from re-analysis systems is difficult, and in general not scientifically justified across the late-1970's. Trend analysis since the late-1970's does usually have utility. We will use these data to estimate ‘observed’ trends in the UTLS. Second, re-analysis systems can have systemic biases. Perhaps most notable as an example is a significant warm bias to NCEP/NCAR reanalysis tropopause temperatures, caused by the selection of assimilated data used [Pawson and Fiorino, 1998]. Thus the reanalyses need to be treated with some caution. For comparison purposes with temperature and the tropopause, we will use the ERA40 reanalysis, because of its high quality and a relatively long (20 year) record for comparison.
 In this section we present results of quantitative diagnostics (1–4 in Table 2) and their grades first. We then discuss diagnostics that are not quantitative (5) or calculated on a subset of models (6–7). The latter diagnostics are useful for looking in more detail at the thermal structure and transport in the TTL.
4.1. Cold Point Tropopause Temperature
 The annual cycle of tropical TCPT for 18 CCMs is illustrated in Figure 1 using the REF-B1 CCMVal-2 model fields. Also shown in addition to the models are several re-analysis systems (ERA40, NCEP, NCEP2, JRA25, ERAI). All reanalyses use monthly means interpolated to CCMVal-2 standard levels (noted on Figure 5), so that the models and re-analysis systems are on the same temporal and vertical grid. TCTP is the cold point temperature on these standard levels, with no further interpolation. The gray region is 3σ from the ERA40 reanalyses. In general almost all models are able to reproduce the annual cycle. There are significant offsets between the models, but the monthly averages of 9 models are clustered within 3σ of the mean of ERA40, as seen in Figure 1 and in the quantitative grades (gm) in Figure 2. The multi-model mean is very close to ERA40 and ERAI, closer than other analysis systems. These results are also better than CCMVal-1 models reported by Gettelman et al.  due to the reduction of outliers, and addition of new or revised models that are closer to observations. Note that there is general quantitative agreement between the reanalyses, with ‘grades’ (compared to ERA40) ranging from 0.6–0.8 (Figure 2). Lower gm scores are largely due to mean monthly offsets (Equation 1). The amplitude and phase of the annual cycle are in good agreement between most observation systems and models. Note that NCEP and NCEP2 have a known warm TCPT bias [Pawson and Fiorino, 1998] that causes the gm score to be zero when compared to ERA40.
 Most models do not show strong long-term trends in TCPT, as indicated in Figure 3. The mean model trend is not significantly different from zero. NCEP and NCEP2 reanalyses show strong cooling, which is not seen in the ERA40, JRA25 or ERAI analyses (noted by Zhou et al. ). ERA40 and ERAI also do not have trends significant at the 99% level. Note that these ‘observed’ trends may differ from other reported cooling trends reported from radiosondes [Gettelman and Forster, 2002; Seidel and Randel, 2006] because of limited sampling from selected radiosonde stations and the gridding and interpolation to the CCMVal-2 standard set of vertical levels. The lack of agreement among reanalyses highlights the uncertainty in long-term variability of the TCPT.
 Inter-annual variability is also illustrated in Figure 3, and used for estimating correlation grades (gc). Most models and re-analysis systems show warming of TCPT in 1991, associated with the eruption of Mt. Pinatubo. Some models have a warming that is much too large (CNRM-ACM, SOCOL, Niwa-SOCOL, MRI). This is factored into the grades for variability (gv) as described in Equation 3 and illustrated in Figure 2. In CNRM-ACM, the warming is due to excessive heating by volcanic aerosols. Other modes of tropical variability, such as the El Niño-Southern Oscillation (ENSO) or the Quasi-Biennial Oscillation (QBO) affect the tropical tropopause [Zhou et al., 2001], but the effects are not clearly seen in the low vertical resolution analysis, and with many CCMs that do not have a QBO. Inter-annual anomalies are not correlated between models and reanalyses, or between reanalyses themselves.
4.2. Lapse Rate Tropopause Pressure
 The pressure of the lapse rate tropopause (PTP) has been shown to be a more robust diagnostic than TCPT [Gettelman et al., 2009]. PTP is more sensitive to increasing thickness below, and TCPT is a more confined vertical response. It is easier to get the bulk thickness (latent heat release) right in a model than TCPT details. This can be seen in a high (0.9 or 1.) correlation gc among most re-analysis systems compared to ERA40 (Figure 4). Grades for 18 models are calculated based on the annual cycle (gm), variance about monthly means (gv) and inter-annual anomalies (gc). The meridional structure of tropopause pressure from models and analysis systems is shown in Figure 5. The models all broadly reproduce the observed tropopause structure. There are some differences in the pressure of the tropical tropopause, which all analysis systems place near the 100 hPa level (when interpolated to CCMVal-2 levels, which are the horizontal lines in Figure 5). Several models shift the tropopause up or down by a level. There are large differences however in the diagnosed tropopause at high latitudes.
 Long-term changes in PTP from 20°S–20°N are shown in Figure 6. There is good agreement between inter-annual anomalies of most of the models, as well as trends in PTP. The simulated variability in models is higher than in the observations. Most models and analysis systems show decreases in PTP associated with volcanic events (Agung 1963, El Chichon 1983, Mt. Pinatubo 1991), though the model variability is larger. In particular it is too large for CNRM-ACM, which jumps 2 levels (90 to 115 hPa). The anomalies for CNRM-ACM are also evident in TCPT. PTP grades indicate a high degree of consistency among the analysis systems as noted above. CCMVal models can broadly reproduce trends and variability, but with too much variance.
 The annual cycle of tropical (20S–20N) ozone at 100 hPa is illustrated in Figure 7 from 18 models. The annual cycle of ozone near the tropical tropopause reflects a combination of: (1) chemical production (ozone is produced in the TTL at a rate of a few parts per billion per day), (2) vertical transport of ascending air, and (3) mixing with stratospheric air from higher latitudes that contains more ozone. Air with higher ozone is likely to have either (a) ascended more slowly or (b) mixed with more high-latitude air. Air with lower ozone is due to rapid transport in deep convection from the marine boundary layer. The seasonal cycle reflects these processes (chemical production and transport). Ozone is compared to the combined and processed NIWA observational data set [Hassler et al., 2008] and grades based on the annual cycle and variance for this data set. Most models reproduce the phase of the annual cycle of ozone correctly in the tropics. Two models (UMSLIMCAT and CNRM-ACM) have a significantly different annual cycle of ozone (Figure 7). Many models have lower amplitude (and mean), while ULAQ, UMUKCA-METO and UMUKCA-UCAM have higher amplitude (and mean), indicating perhaps slow transport times in the TTL.
 The spread of model O3 values is reflected in many gm = 0 grades (Figure 8). The CCM spread is as large as in the CCMVal-1 models [Gettelman et al., 2009, Figure 8] with some models as similar outliers (e.g.: ULAQ). Note that the 3 models with tropospheric chemistry (CAM3.5, EMAC and ULAQ) do not have consistently better performance: ULAQ is high, and CAM3.5 and EMAC are low, and all have relatively low total (Gsum) grades. The higher altitude (lower pressure) tropopause in CAM3.5 and EMAC would tend to lower 100 hPa O3.
4.4. Water Vapor
 Water vapor in the lower stratosphere is critical for the chemistry and climate of the stratosphere, affecting both stratospheric chemistry by regulating total hydrogen as well as affecting UTLS temperatures through the radiative impact of water vapor [SPARC, 2000]. Thus reproducing the transport of water vapor through the tropical tropopause is a critical requirement of CCMs in the TTL. Representing the appropriate relationships between cold point temperature and water vapor is also critical, as it requires the appropriate representation of processes that regulate water vapor, at least at the large scale.
Figure 9 presents the annual cycle of water vapor from 16 CCMs and HALOE in the lower stratosphere just above the TTL and the cold point (80 hPa). UMUKCA models fix water vapor in the stratosphere and are not shown. As pointed out by Mote et al. , this is the entry point or ‘recording head’ of the stratospheric ‘tape recorder’ circulation. The transport associated with this circulation is discussed by Eyring et al. . Here we focus on the entry point. Most models are able to reproduce the annual cycle of water vapor with a minimum in NH spring and a maximum in NH fall and winter. There is a wide spread in the ‘entry’ value of water vapor at this level: from 2–6 ppmv, with observations from HALOE closer to 3–4 ppmv. The spread results in 5 models with gm = 0. (Figure 10). The uncertainties in HALOE observations are discussed in detail by SPARC , and are less than ±20% at this level. The shading indicates 3σ inter-annual variability, but is similar to this 20% range. These results are slightly better than CCMVal-1 models [Gettelman et al., 2009] due to a tighter temperature range (Figure 1). The multi-model mean does indicate that most models shift the water vapor minimum at 80 hPa 1–2 months too early, though the multi-model mean water vapor mixing ratio is very similar to HALOE. The annual cycle is virtually absent in UMETRAC, CNRM-ACM and CCSRNIES.
4.5. Saturation at the Cold Point
 Another method of examining the dehydration process is to look at the relationship between TCPT and water vapor just above the cold point (80 hPa). This is a broad way of understanding integrated TTL transport and dehydration in the absence of data for off-line Lagrangian cold point calculations as in Section 4.7. TCPT regulates H2O [Brewer, 1949], so the relationship can be analyzed by looking at the ratio of water vapor to the saturation vapor mixing ratio at the cold point (QSAT(CPT)). For example, minimum ERA40 TCPT (Figure 1) is about 192K, which corresponds at 80 hPa to a QSAT of 5.5 ppmv. Figure 11 is an update of this relationship shown by Gettelman et al.  for 16 models.
 Note that the UMUKCA models have very high cold point temperatures (consistent with high ozone at 100 hPa as a result of slow transport times), so their water vapor was fixed (and they are not shown). The results indicate that most of the models cluster similarly to the observations (H2O from HALOE and TCPT from ERA40) near a line that would imply 70% saturation with constant temperatures and transport (which is not the case, hence water is less than implied by TCPT). Gettelman et al.  present results for 90 hPa where the atmosphere is slightly drier and results are closer to a 0:0.6 line. The spread of the models is similar between CCMVal-1 and CCMVal-2. Three models are near the 1:1 line. MRI is high due to permitted ice-supersaturation. However, 3 models (CNRM-ACM, CCSRNIES and UMETRAC) have significantly more lower stratospheric H2O than would seem to be justified by their TCPT. This indicates potential problems in fundamental transport, variability and/or condensation processes in the TTL. This is also clear from Figure 9 and H2O grades (Figure 10).
4.6. Tropical Tropopause Inversion Layer
 Recent studies using high-resolution radiosonde data have revealed the presence of a temperature inversion layer, typically a few kilometers deep, located right above the tropopause [Birner et al., 2002; Birner, 2006; Bell and Geller, 2008]. This Tropopause Inversion Layer (TIL) is also characterized by a sharp and strong buoyancy frequency maximum. The buoyancy frequency (also called the Brunt-Väisälä frequency) is defined as N2 = . The presence of the TIL has been further confirmed by Global Positioning System (GPS) Radio Occultation (RO) data [Randel et al., 2007; Grise et al., 2010]; these independent measurements have shown that the TIL is present almost everywhere from the deep tropics to the pole in both hemispheres (Figures 12a and 12d) with a minimum value in winter hemisphere polar regions. Although the formation and maintenance mechanisms of the TIL remain to be determined, its presence has potentially important implications for the cross-tropopause exchange of passive tracers/water vapor and for the dynamical coupling between stratosphere and troposphere, and has recently been receiving significant attention.
 The zonal-mean structure of the TIL, simulated by REF-B1 integrations for 9 models (listed in Figure 13) with available instantaneous data, is examined and compared with observations. The observed TIL is derived from the GPS-RO data set of the Constellation Observing System for Meteorology, Ionosphere, and Climate (COSMIC) mission from April 2006–April 2009 with about 2500–3000 soundings per day.
 All analyses are performed on the log-p coordinate with tropopause pressure (pTP) as a reference level: i.e. z = −Hln(p/pTP) where H is a scale height of 8 km. Note that the conventional log-p coordinate uses surface pressure as a reference level. At each model grid point (or COSMIC profile) tropopause pressure is first computed on the native model or GPS-RO vertical grid using the WMO definition of lapse-rate tropopause. The instantaneous fields of interest, such as temperature and N2, are then interpolated onto the tropopause-based z coordinate using a log-p linear interpolation, and are averaged over longitudes for DJF and JJA. Resulting seasonally-averaged fields in each model are finally interpolated onto 5-degree interval latitudes to construct multi-model mean fields. The COSMIC data are also binned into 5-degree intervals in latitudes. The observed TIL is computed using both data at full (or raw) levels and data only at CCMVal-2 standard levels (Figure 5). Degraded observations allow a more direct comparison of the simulated TIL with observations.
 The analysis results and the average of 9 models are summarized in Figure 12 in terms of N2. As shown in Figures 12a and 12d, sharp maxima of N2, located just above the tropopause (z = 0), are distinct. They are generally stronger in the summer hemisphere than in the winter hemisphere, but have little hemispheric difference: i.e. the N2 distribution in the NH summer is quantitatively similar to the one in the SH summer. These findings are consistent with previous work [Randel et al., 2007; Grise et al., 2010].
Figures 12b and 12e show the N2 distribution for degraded GPS data. Maximum values of N2 are lower. In addition, their locations are somewhat higher than those in the raw data. The effect is small in the tropics and larger at high latitudes. This strong sensitivity is not surprising as both tropopause pressure and temperature, which directly affect the sharpness of the TIL [Bell and Geller, 2008], are underestimated in coarse resolution GPS data.
 The above results suggest that the CCMVal-2 models may not be able to reproduce a quantitative structure of the observed TIL, simply because of coarse resolution in the vertical. Data to perform the TIL analysis was not available for the two highest vertical resolution models (E39CA and EMAC). The simulated TIL (Figures 12c and 12f) is generally weaker and broader than observed using full resolution GPS RO data (Figures 12a and 12d). Simulations do look more like estimates from observations using CCMVal-2 vertical resolution (Figures 12b and 12e). Analysis of higher vertical resolution runs from WACCM with 300m vertical resolution in the UTLS (WACCM-hires) does indicate that at higher vertical resolution this model has an increased peak N2 near the tropopause in better agreement with GPS RO observations.
Figure 13 illustrates profiles of N2 from GPS observations and simulations in the tropics for 2 seasons from 9 models and WACCM-hires. The CCMVal-2 models underestimate N2 in the troposphere and misplace the tropical TIL. Simulated N2 in the tropical lower stratosphere is also much larger than observed by GPS RO, even at degraded resolution. The difference from observations might be caused by less adiabatic cooling associated with weak upwelling. Note that WACCM-hires has a larger peak N2 and sharper gradient and closer to the tropopause than the standard resolution model. In addition, two of the lower vertical resolution models analyzed (CCSRNIES, SOCOL; see Table 1) also have very broad TIL structures.
 It should be emphasized that, although the quantitative structure of the TIL is somewhat underestimated, the CCMVal-2 models successfully reproduce the qualitative structure of the TIL including its seasonality. In fact, the models' simulated TIL is more realistic than one derived from re-analysis data, especially in the extratropics [Birner et al., 2006]. This may be because the re-analysis systems are ingesting data that may cause degradation to the structure, either through error covariances or coarse vertical resolution associated with assimilated data. Further discussion of the TIL in the extratropics is given by Hegglin et al. .
4.7. Transport in the TTL
 Lagrangian trajectory studies are established tools for studying transport processes in the tropical tropopause and in particular transport from the troposphere to the stratosphere [e.g., Hatsushika and Yamazaki, 2003; Bonazzola and Haynes, 2004; Fueglistaler et al., 2004]. Stratospheric water vapor is strongly correlated with the Lagrangian Cold Point [Fueglistaler and Haynes, 2005]. We analyze the minimum temperature (Tmin) and TTL residence time of two CCMVal-2 models, CMAM and E39CA, and compare them to ERA40 trajectories following the methodology of Kremser et al. . These models provided the necessary instantaneous 6-hourly fields of temperature, winds and heating rates needed to perform the calculation. Two sets of Tmin calculations were performed using ERA40. A ‘standard’ calculation used 3D winds and a diabatic calculation used vertical winds based on heating rates following Wohltmann and Rex . The latter set of calculations using diabatic calculations is referred to as the ‘reference’ calculation.
 The trajectories were analyzed to determine the geographical distribution of points where individual air masses encounter their minimum temperature and thus minimum water vapor mixing ratio (referred to as dehydration points) during their ascent through the TTL into the stratosphere. In addition, the residence times of air parcels in the TTL were derived.
 For all years analyzed, both CCMs have a warm bias of the temperatures in the dehydration points of about 6 K (E39CA) and 8 K (CMAM) in NH winter and about 2 K (E39CA) and 4 K (CMAM) in NH summer compared to the ERA40 reference calculation. This is not the same as the temperature bias in the models (Figure 1). The Eulerian mean tropical T is about 3K low for E39CA and 1K high for CMAM. Thus the overall degree of dehydration simulated during transport of air into the stratosphere could be significantly too low, a known shortcoming of simulations with CCMs [Eyring et al., 2006]. The reasons for the warm bias are probably deficiencies in transport, given differences from the model Eulerian TCPT.
Figure 14 shows that the overall geographical distribution of dehydration points in the simulation based on ERA40 data are fairly well reproduced by both CCMs in NH winter 1995–1996 (December–February, DJF). This suggests that the geographical distribution of dehydration points in winter is fairly robust. A closer look at the figure reveals that in E39CA the region of the main water vapor flux is shifted eastwards compared to ERA40 and the model shows excessive water vapor transport through warm regions over Africa. CMAM compares very well with the reference calculations and if anything only slightly overestimates the water vapor transport over the warm regions of South America. These overestimates in warm regions however are sufficient to create a significant warm bias to the Lagrangian cold point estimates.
 In NH summer (June–August, JJA) 1996 the reference calculations show that the water vapor transport into the stratosphere is clearly dominated by the Indian monsoon and downwind regions (not shown), similar to the findings of Fueglistaler and Haynes . This is largely reproduced by CMAM, which also reproduces the location of this feature nicely. But the water vapor flux through the warm regions over Africa is overestimated. In E39CA the impact of the Indian monsoon is not well reproduced and dehydration in NH summer 1996 occurs mostly over the central Pacific rather than over India and the westernmost Pacific. The differences indicate deficiencies in TTL transport. This is different than the Eulerian transport discussed in Section 4.5.
 The residence times in the upper part of the TTL (θ = 385–395K) were derived from the trajectory calculations to examine the time scales of transport processes through the TTL, the key parameter for chemical transformation of air before it gets into the stratosphere. The average residence time in this layer in ERA40 diabatic calculations is about 9 days (DJF) and 12 days (JJA). These times are cut in half (faster transport) if the ‘standard’ winds are used. CMAM trajectories remain about 11 days (DJF) and 10 days (JJA) in the TTL, but with a long tail to the distribution for long residence times up to 30 days. E39CA residence times are 6 days in both seasons, with a similar distribution to ERA40. Thus the models do not discriminate residence time seasonally as well as ERA40.
 The CCMVal-2 ‘historical’ (past) and ‘future’ model runs provide a unique multi-model ensemble to examine trends in the UTLS. UTLS trends for CCMVal-1 models, and for Intergovernmental Panel on Climate Change (IPCC) 4th Assessment Report (AR4) models, have recently been analyzed by Gettelman et al. , and Son et al. [2009b, 2009a]. Historical trends have also been presented for REF-B1 historical simulations in the context of validating the models against observations (Figures 3 and 6). Here we further discuss historical trends and present some basic results of future trends in the UTLS from CCMVal-2 models. We present key trends from the simulations in the tropical UTLS, and in the extratropical LMS below the tropical tropopause that may impact the TTL. For the latter we focus only on tropopause pressure and O3. More details on extratropical diagnostics are in the companion paper by Hegglin et al. .
 Future runs were processed using zonal mean data. As noted by Son et al. [2009b] and Gettelman et al. , the use of zonal mean temperatures does not significantly affect values or trends of derived tropopause parameters. We have further validated this by using four models to calculate PTP and TCPT using both 2D zonal monthly mean and 3D monthly mean temperatures (CMAM, CCSRNIES, MRI and SOCOL). Results indicate that there is less than a ±10% difference in the magnitude of the trends, and no change in significance.
5.1. Tropical Tropopause Trends
 Tropical PTP in the models over the historical period is well constrained. Historical trends are similar to analysis systems, and indicate a decrease in pressure (Figure 6) in REF-B1 simulations. The robustness of the tropopause pressure grade was also noted for CCMVal-1 models by Gettelman et al. . Almost all models have historical trends that are close to observations and highly significant. Over 1980–1999, analyses have trends of −0.4 hPa/decade, and models are slightly higher (−0.3 to −0.9 hPa/decade). The four ‘best’ models (CMAM, E39CA, GEOSCCM, WACCM: see Section 6) have a mean trend of −0.6 hPa/decade. Inter-annual variability is highly correlated with observations, and generally small. Model absolute values of pressure vary, with many close to the observations, but several models are a standard level (10–15 hPa) above or below. There are also generally larger decreases in pressure in the subtropics where the tropopause gradients are large. This implies a meridional shift in the tropopause. Future trends (from REF-B2 runs) are illustrated in Figure 15. Note that for the multiple ensembles for WACCM (3) and CMAM (2) the future trends are quantitatively the same for different ensemble members or the single model ensemble mean. There are some large differences in trends in the models. CMAM, UMSLIMCAT, UMUKCA-METO and CNRM-ACM have future trends that are larger (−10–15 hPa per century) than other models (−5 hPa/century). The multi-model mean is about −7 hPa per century. In the vase of CMAM, this looks to be due to a large increase in the simulated future Brewer-Dobson Circulation [McLandress et al., 2010].
 Historical tropical cold point temperature trends are illustrated for the REF-B1 runs in Figure 3. Models do not show the cooling over the last 25 years seen in NCEP and NCEP2. However, an analysis of the distribution of the historical trends in space indicates coherent patterns of warming and cooling: in general the patterns represent alterations to the equatorial Kelvin wave and Rossby wave patterns induced by the change in strength of an equatorial heat source [Gill, 1980]. The heat source variations are changes in convection. However, different models put these patterns in different locations in the tropics. For the subset of models with cloud variables, historical trends indicate cooling in the western Pacific, and increases in clouds there. Some models indicate cooling in different regions. The overall picture is one of cooling in some regions balancing warming, for little net historical trend. This indicates that TCPT patterns respond to changes in tropical deep convection. The confidence in analysis systems might be limited by the sparse input data used for constraining the analysis models in the tropics.
 TCPT future trends (from REF-B2 runs) are illustrated in Figure 16. Most models (including the best performing ones) show a slow increase in minimum temperature of 0.5–1.0 K per century. Several models (ULAQ, UMUKCA-METO) have larger future trends. As seen in Figure 9, the future temperature trends will have implications for future water vapor trends, and do have implications for future cloud trends as well.
5.2. TTL Water Vapor Trends
 There exist no consistent observations of historical water vapor trends over long periods of time. There are indications of long term increases in water vapor from a variety of records [SPARC, 2000], and an increase in water vapor in the 1990s observed by HALOE, followed by a step change decrease after 2000. The overall historical trend in HALOE H2O from 1992–2004 is negative (−0.05 ppmv yr−1) and significant at the 99% level. Almost all models also simulate a negative H2O trend over this period, with the multi-model mean −0.03 ppmv yr−1. If one model with high variance (CNRM-ACM) is excluded from the multi-model mean, the trend is significant at the 99% level.
 The long-term observed increase is broadly consistent with increases in methane in the latter half of the 20th century. Recent changes in water vapor (since 1992) are broadly consistent with changes in the tropical tropopause temperature (see Section 4.4 and Randel et al. ). The changes in TCPT are partially related to changes in tropical upwelling induced by SST anomalies [Rosenlof and Reid, 2008]. Thus CCMs can translate surface forcing into lower stratospheric water vapor changes.
 Future changes in water vapor just above the cold point are illustrated in Figure 17. Also illustrated in Figure 17 are multiple ensembles from WACCM (3) and CMAM (2), confirming that their future trends are different from each other, but consistent across the same model ensemble members. Models generally indicate that water vapor in the lower stratosphere will increase. Most model future trends are from 0.5–1.0 ppmv per century, or nearly 25%. These future trends are affected very little by methane oxidation at 80 hPa, so that is unlikely to be a cause of these future trends. This is consistent with the magnitude of future TCPT trends, and future temperature trends of 0.5–1K per century at 193K translate into a 0.5–1 ppmv per century increase in water vapor. Models with larger future temperature trends, or a stronger correlation between water vapor and temperature, indicate larger future increases in water vapor. This is true for example of ULAQ and CMAM (large T increase) as well as MRI, CNRM-ACM and CCSRNIES (strong dependence of H2O on T). SOCOL indicates a large change in water vapor, without a large change in temperature. Note that UMUKCA models (fixed water vapor) and GEOSCCM (output problem with water vapor) are not included in the analysis of REF-B2. Future water vapor trends are also illustrated in Figure 18, indicating larger water vapor trends in the upper tropical troposphere at the convective outflow level near 200 hPa.
5.3. Tropopause Relative Trends
 Radiatively active tracers such as H2O and O3 exhibit large gradients across the tropopause. The radiative response to changes in these tracers is therefore expected to be highly sensitive to the detailed structure of the trends of H2O and O3 in the global UTLS [Randel et al., 2007]. Generally, one expects the trends in absolute (e.g. pressure) coordinates to be affected by tropopause height trends. Therefore we show two sets of future trends, in absolute coordinates as well as in tropopause-based coordinates to highlight the sensitivity of trends to the tropopause. Trends are calculated based on the zonal monthly mean output with respect to the tropopause obtained from the zonal monthly mean temperature data.
Figure 18 shows multi-model ensemble of annual mean trends of O3 (Figure 18, top) and H2O (Figure 18, bottom) for the period 1960–2100 based on the 9 REF-B2 models with data from 1960–2100. Models included are: CAM3.5, CCSRNIES, CMAM, LMDZ-repro, MRI, SOCOL, ULAQ, UMSLIMCAT, and WACCM. Figure 18 (left) shows future trends in conventional (absolute) coordinates whereas Figure 18 (right) shows future trends in tropopause-based coordinates. The latter are obtained by first calculating the decadal shift in tropopause pressure followed by shifting the decadal changes of the respective field (O3 or H2O) to a reference tropopause pressure. The shift in the tropopause is shown on Figure 18 (left). Here, the average over the period 1960–1980 is used as reference state.
 Future O3 trends are negative (−2% decade−1) in conventional coordinates in the tropical lower stratosphere. Decreasing O3 is consistent with a strengthening of tropical upwelling (an enhancement of the BDC). Moderate increases of around 0.5–1.5% decade−1 are found throughout the upper troposphere and in the extratropical lower stratosphere. These results are consistent with those of Hegglin and Shepherd  and Li et al.  in the tropics and mid-latitudes, but differ in the SH polar regions. In tropopause-based coordinates however the future trends are strongly positive above the tropopause in both the tropics and extratropics (4–5% decade−1). In the tropics the sign is reversed between conventional and tropopause based coordinates. Ozone decreases due to faster upwelling which results from an enhanced BDC. Thus O3 decreases at any given pressure level. This may be a direct result of higher tropical SST [Deckert and Dameris, 2008].
 But the gradient of ozone around the tropopause increases as the tropopause moves to higher altitudes, so relative to the tropopause, O3 increases. This future trend is larger than the decrease at fixed altitude/pressure due to the strengthened BDC. In the extratropical lower stratosphere both contributions are positive (increasing BDC increases ozone) and are therefore amplified in tropopause-based coordinates.
 H2O exhibits strong positive future trends in the upper troposphere from the realistic upper troposphere (UT) base state. The base state has high humidity in tropical convective outflow regions and low humidity in down-welling branches of the Hadley and Walker circulations [Gettelman and Birner, 2007]. In the tropical UT maximum future trends of 7–8% decade−1 are found around 200 hPa. These future trends are likely due to increases in surface to middle tropospheric temperature associated with anthropogenic greenhouse gas induced warming. In conventional coordinates one also finds rather strong positive changes throughout the extratropical LMS of between 3–5% decade−1. However, these changes in the LMS are in part caused by the future upward tropopause trend: in tropopause-based coordinates the strong positive trend in H2O is largely confined to the upper troposphere whereas stratospheric H2O shows moderate changes of around 2% decade−1 throughout the global lower stratosphere.
 Increases in H2O coincide with significant increases in cloud frequency of occurrence. Only a few models provided 3D TTL cloud fields for REF-B1: CAM3.5, LMDZrepro and WACCM. For all three models, the historical trend in fractional cloud coverage (cloudiness) averaged from 200–100 hPa over 1960–2005 was significant at +0.0015/decade (absolute). With an average cloud fraction of 0.05, this represents 3%/decade increase in TTL cloudiness. Unfortunately, no observations of clouds exist for a similar period with such precision, and existing determinations of cloud fractions in the TTL vary strongly with instrument sensitivity. For future scenarios, results were available for 2 models (CAM3.5 and 3 WACCM realizations). CAM3.5 and WACCM are essentially versions of the same underlying tropospheric GCM, so these should be considered for clouds as 4 realizations of a similar model. Future trends in TTL cloudiness are significant at the 99% level and similar to REF-B1, +0.0012/decade (absolute), 2.5%/decade, or 25% over the 21st century (35% over the 1960–2100 period). Future trends in cloudiness are driven not by future temperature trends (since the local temperature is increasing), but by increases in water vapor of 4–9% decade−1 (Figure 18), modulated (reduced) by increasing temperature.
5.4. Extratropical Tropopause Trends
 Trends in extratropical tropopause pressure for future scenarios are shown as anomalies over the south (Figure 19, left) and north (in Figure 19, right) polar caps for REF-B2 simulations from 1960–2100. Multiple ensembles are shown for WACCM and CMAM. As in the tropics, PTP is expected to decrease in both hemispheres. The magnitude of the overall future trends (−20 hPa per century) are not quantitatively different between hemispheres over the 21st century. However, it is clear that there are differences in future polar tropopause pressure trends between the hemispheres: the trends in the SH polar regions are not steady, but are larger from 1960–2000 and lower (flatter) from 2000–2050. As noted by Son et al. [2009b] in comparing IPCC AR4 models with and without ozone depletion, these differences are due to the effects of ozone depletion (1960–2000) and recovery (2000–2050).
 Quantitative trends were examined in 3 different periods, broadly characterized by ozone loss (1960–2000), ozone recovery (2001–2050), and steady ozone (2051–2099). SH tropopause pressure decreases more strongly during the ozone loss period (−0.5 hPa/yr), is flat or increases during ozone recovery, and decreases slightly during steady ozone period (−0.2 hPa/yr). Throughout all these periods there are changes in anthropogenic greenhouse gas concentrations, climate and surface temperature. In the NH, by contrast, future trends are similar in all periods and slightly negative (−0.2 hPa/yr).
5.5. Extratropical Ozone Trends
Figure 18 indicates changes in ozone in the extratropical LMS in the 21st century. Figure 20 indicates the time-series of O3 anomalies for the SH (Figure 20, left) and NH (Figure 20, right) averaged over the LMS (40–60 latitude, 200–100 hPa). Trends are similar if different averaging domains are used. Future O3 trends in the SH are strongly influenced by anthropogenic O3 depletion and recovery and are not monotonic. NH future O3 trends however are broadly monotonic in the 21st century. Since most CCMVal-2 models do not include tropospheric ozone chemistry, and those that do (CAM3.5) do not simulate different trends, these future trends must be due to changes in transport, either from decreases in isentropic transport from the tropics (reduced fraction of tropical air) or enhanced descent in the BDC. Overlaid on this trend is likely a moderate ozone depletion and recovery effect, especially evident in the SH. For the NH region in Figure 20, these future trends of +2% decade−1 indicate an increase of nearly +30% (0.1 ppmv) by the end of the 21st century from present (year 2000) conditions. The change is most significant and large right above the tropopause (Figure 18).
6. Summary and Conclusions
6.1. Quantitative Diagnostics and Discussion
Figure 21 includes the grading obtained for four diagnostics and provides an overall assessment of how well the models performed in the TTL. There are 4 models that score at least 0.5 on all 4 diagnostics and have consistent transport and trends: CMAM, E39CA, GEOSCCM and WACCM. The multi model mean scores highly on all the quantitative diagnostics. There are 5 more models that have 3 of 4 grades above 0.5 (AMTRAC, CAM3.5, MRI, UMETRAC, ULAQ). These thresholds are quantitatively arbitrary, but every model below this threshold has a significant deficiency in the TTL noted in the paper, and none of the highest scoring models have any obvious deficiencies in the formulation of TTL processes (e.g., H2O above the TCPT is appropriate for TCPT) though they may still have biases (e.g., individual grade components like gm = 0). Models with obvious deficiencies score significantly lower on specific grades or components of grades. The addition of components for variance and correlation allows further insight into processes. We have not investigated the statistical significance of these grades, discussed by Grewe and Sausen , and leave that as a subject for future work.
6.2. Qualitative Discussion
 The annual cycle of tropical cold point temperatures are reproduced by most models, as is the amplitude and timing of the annual cycle. There remain some significant biases between models. The UMUKCA model temperatures are too high, and CNRM-ACM and CCSRNIES temperatures are too low. CNRM-ACM has too large a response to volcanic perturbations, and SOCOL and Niwa-SOCOL are also high in this regard. Most models do not have strong trends in TCPT over the historical period. Re-analysis systems also disagree regarding estimated TCPT trends over the satellite period (since 1980).
 Most models place the tropical tropopause pressure at the right level (about 100 hPa). The UMUKCA models have higher (120 hPa) PTP, which may be a reason for their tropopause temperature warm bias. The high PTP in UMUKCA models may be a function of a slightly different vertical structure in the tropopause region, and a slower BDC. CNRM-ACM, CCSRNIES, the SOCOL models and EMAC have lower tropopause pressures. Most models have historical trends in tropopause pressure consistent with observations. Again, CNRM-ACM has too large a response to volcanic events. In general model variance is higher than observed inter-annual variance of tropopause pressure. Trends are consistent between models and analysis systems and variability is highly correlated.
6.2.3. Tropical Ozone
 The annual cycle in 100 hPa ozone is generally well reproduced with high JJA summer ozone. There are some differences in the absolute value of ozone. The UMUKCA models and ULAQ have significantly higher O3 at 100 hPa than observed. CNRM-ACM and UMSLIMCAT have the wrong annual cycle. Models with tropospheric chemistry (CAM3.5, EMAC, ULAQ) do not appear to perform significantly better. The multi-model mean is a good estimate of the observations.
6.2.4. Tropical Water Vapor
 UMETRAC, CNRM-ACM, ULAQ and MRI are too wet at 80 hPa, and several models (LMDZrepro, EMAC, CMAM) are too dry, with water vapor below 3 ppmv. The annual cycle is not as well produced, with many models shifted relative to HALOE observations by 1–2 months. The models generally reproduce the observed decrease in 80 hPa H2O from 1992–2004. With respect to the Cold Point Temperature and Water Vapor correlation, there are 3 models (CCSRNIES, CNRM-ACM and UMETRAC) that are clear outliers: there appears to be more water vapor than the temperatures would permit if transport were occurring similarly to observations. UMUKCA models prescribe TTL water vapor.
6.2.5. Tropopause Inversion Layer
 Models are able to simulate a TIL. The TIL resembles observations on a similar coarse vertical resolution, but extends deeper vertically than high vertical resolution observations. The maximum value of N2 is found at higher altitude than observed. Higher vertical resolution does improve model simulations. Models reproduce the annual cycle in TIL structure, with the tropical TIL slightly stronger during DJF and the extratropical TIL stronger in the summer hemisphere.
6.2.6. Lagrangian Cold Point
 Two models examined broadly reproduce the distribution of Lagrangian minimum temperatures (Tmin) in analysis systems. However, Tmin is higher than the ERA40 reference calculation, due to differences in transport location. Consistent with a high Tmin, H2O is high in one model (E39CA) but not in the other (CMAM). Further work with more models is needed to better understand these differences.
 There is a spread of residence times in the two models, mirroring spread in analysis systems using different vertical advection. It is likely that model residence times are a stringent test of the model vertical advection schemes and schemes that are too diffusive will have short residence times.
 The results of this analysis indicate that there is a spread in performance among models in the TTL relative to observations, and there are some (4) models with quantitatively better results relative to observations, but half of the models (9 of 18) perform well on most (3 of 4) grades. The multi model mean generally is a very good representation of the TTL. Quantitative grades including variability confirm the qualitative view of models. Further work to make the grading of models more rigorous is desired.
 The tropical tropopause pressure and CPT exhibit significant biases between models, although the seasonal cycles are generally reasonable. This finding implies a wide range of tropical LS H2O values. However, the spread of CPT values is smaller than for CCMVal-1 models [Gettelman et al., 2009], indicating improvement in overall model performance. The amplitude and phase of the annual cycle is improved and all models monthly anomalies of TCPT are within 3σ of the observations.
 Critically, many models and the multi-model mean can now broadly reproduce recently observed decreases in lower stratospheric water vapor, likely related to SST variability. Thus models can translate SST forcing into changes in lower stratospheric H2O.
 Comparison of the TCPT with H2O reveals simulated transport behavior different from observations where models have higher water vapor concentrations above the cold point than implied by the saturation value of TCPT. The observed mean ratio of 80 hPa water vapor to the saturation value at the cold point minimum temperature is about 0.65–0.7, and most models reproduce this ratio, yielding increased confidence in TTL transport.
 Lagrangian cold points in the two models examined have a reasonable distribution but suffer from temperature biases, and the TIL depth is generally too deep and slightly shifted from observations. The representation of the TIL appears to be a function of vertical resolution. Degraded resolution observations are more similar to models, and a higher vertical resolution model (δz = 300 m in the TTL) has gradients in stability that better resemble observations. Hence higher vertical resolution seems to improve the representation of stability in the TTL.
 Simulations indicate significant impacts of stratospheric O3 depletion on historical and future trends in extratropical tropopause pressure and on historical and future O3 trends in the extratropical LMS. NH and SH future trends are very different, and SH trends are not monotonic due to O3 depletion and recovery. Ozone depletion strengthens the trends in the SH, and recovery weakens the trends. This is consistent with other recent analyses with CCMVal-1 models [Son et al., 2009b]. Extratropical LMS O3 trends may impact O3 concentrations in the TTL through quasi-isentropic transport. Extratropical PTP trends are indicators of shifts in the subtropical jets and circulation that may impact the tropics, for example by increasing the width of the tropical belt [Seidel et al., 2008].
 The projected O3 increase in the NH extratropical LMS is nearly 30% by the end of the 21st century. This is not due to tropospheric chemistry, but most likely is due to increased down-welling from an enhanced BDC and the effects of ozone recovery, also noted by Hegglin and Shepherd  and Li et al. . These significant changes might affect the tropopause structure, and radiative forcing calculated at the tropopause, as well as the stratosphere-troposphere exchange of ozone and upper tropospheric ozone. Understanding the mechanisms for this increase using CCMs with tropospheric chemistry is a critical future endeavor [Hegglin and Shepherd, 2009; Stevenson, 2009].
 Future increases in tropical ozone with respect to the tropopause also strongly imply changes to TTL transport that might affect short lived species (for example, those containing bromine). Future CCM simulations should include a suite of short lived compounds to better evaluate TTL transport and chemistry.
 Simulations show good historical fidelity with observed trends and anomalies in PTP. Models do not reproduce historical TCPT trends, but these are uncertain from reanalyses. Models project decreases in tropical PTP in the 21st century. Simulated quantitative trends in PTP are similar to trends found by Gettelman et al.  with a small subset of CCMVal-1 models run to 2100. The quantitative values quoted are for those 4 models with high quantitative grades, yielding a higher confidence in these results than in earlier analyses.
 Models reproduce recent decreases in H2O seen in reanalyses and HALOE observations. This yields confidence in future trends. Increasing H2O in the tropical lower stratosphere is associated with increasing TCPT and decreasing PTP. Changes over 2000–2100 are significant nearly +1K in TCPT and +1 ppmv of water vapor, representing a 20–30% increase. There remains some spread in reported model results, but most outliers for trends occur due to noted model deficiencies that are traceable to low performance in some diagnostics.
 However, there is little spatial coherence across models in the structure of historical or future trends in water vapor (and temperature), except to tie them to the parameterized process of deep cumulus convection. There are large future increases in water vapor in the lower region of the TTL near 200 hPa. Consistent with this picture, there are significant increases in TTL cloudiness (35% over the 1960–2100 period) in the one family of models with cloud fields to 2100. Thus improving confidence in convective parametrization and its effect on tropical atmospheric dynamics and thermodynamics is critical for improving confidence in predictions of the future state of the TTL, both for transport into the stratosphere and radiative effects on surface climate.
 What has changed since CCMVal-1 [Gettelman et al., 2009]? First, there are many more models for analysis, so the multi-model mean is more significant. Second, the spread of TCPT has narrowed. Third, historical runs now simulate modest recent decreases in lower stratosphere H2O, as do observations. This yields increasing confidence in future trends in TCPT and H2O. Fourth, we have a much more detailed picture from a limited subset of models of the thermal structure of the TTL (TIL) and the transport through the TTL in simulations. There are still deficiencies in many models in TCPT and TTL transport, but quantitative assessment indicates at least half the models are performing acceptably in the TTL.
 The strongest overall recommendations for improving the representation of the TTL in CCMs are: (1) improving vertical resolution and (2) addition of tropospheric chemistry and short lived species. Additionally, making available limited high frequency output (for trajectory studies) would improve the level of possible process-based analysis.
 The National Center for Atmospheric Research is sponsored by the United States National Science Foundation. The work of N. Butchart and S. Hardiman was supported by the Joint DECC and Defra Integrated Climate Programme - DECC/Defra (GA01101). We acknowledge the modeling groups for making their simulations available for this analysis, the Chemistry-Climate Model Validation (CCMVal) Activity for WCRP's (World Climate Research Programme) SPARC (Stratospheric Processes and their Role in Climate) project for organizing and coordinating the model data analysis activity, and the British Atmospheric Data Center (BADC) for collecting and archiving the CCMVal model output. CCSRNIES research was supported by the Global Environmental Research Fund of the Ministry of the Environment of Japan (A-071). CCSRNIES and MRI simulations were made with the supercomputer at the National Institute for Environmental Studies, Japan. European contributions were supported by the European Union Integrated Project SCOUT-O3. WACCM-hires simulations were performed at the Centro de Supercomputacion de Galicia. Thanks go to Darryn Waugh for the use of code for producing tables.