On the lack of stratospheric dynamical variability in low-top versions of the CMIP5 models


Corresponding author: A. J. Charlton-Perez, Department of Meteorology, University of Reading, Reading, UK. (a.j.charlton@reading.ac.uk)


[1] We describe the main differences in simulations of stratospheric climate and variability by models within the fifth Coupled Model Intercomparison Project (CMIP5) that have a model top above the stratopause and relatively fine stratospheric vertical resolution (high-top), and those that have a model top below the stratopause (low-top). Although the simulation of mean stratospheric climate by the two model ensembles is similar, the low-top model ensemble has very weak stratospheric variability on daily and interannual time scales. The frequency of major sudden stratospheric warming events is strongly underestimated by the low-top models with less than half the frequency of events observed in the reanalysis data and high-top models. The lack of stratospheric variability in the low-top models affects their stratosphere-troposphere coupling, resulting in short-lived anomalies in the Northern Annular Mode, which do not produce long-lasting tropospheric impacts, as seen in observations. The lack of stratospheric variability, however, does not appear to have any impact on the ability of the low-top models to reproduce past stratospheric temperature trends. We find little improvement in the simulation of decadal variability for the high-top models compared to the low-top, which is likely related to the fact that neither ensemble produces a realistic dynamical response to volcanic eruptions.

1 Introduction

[2] One major change in coupled-climate modeling between the third (CMIP3) and fifth (CMIP5) Coupled Model Intercomparison Projects is an increase in the number of models with model tops above the stratopause and general progress toward a more realistic representation of the stratosphere in coupled climate models. As an example of this trend, only 5 of the 23 CMIP3 models considered in Cordero and Forster [2006] had tops above 1 hPa. In the CMIP5 archive, this ratio has increased to 15 models of 45. Furthermore, very few models now place their model lid in the middle stratosphere near 10 hPa, thereby reducing the number of models that are likely to severely distort stratospheric dynamics. In the present study, we seek to understand the benefits of a model lid above 1 hPa to the simulation of stratospheric climate and variability by comparing the simulation of stratospheric climate by a subset of models submitted to the CMIP5 archive.

[3] The move to increased stratospheric representation in coupled climate models has been motivated in part by the large body of recent work providing evidence that both internal stratospheric climate variability and external stratospheric climate forcing can be important drivers of tropospheric climate (as discussed by Gerber et al. [2012]). The climate models that make up the CMIP5 ensemble might be crudely divided into two subensembles; one representing models that attempt to fully represent stratospheric processes (for which we use the shorthand, “high-top”) and the other representing models that do not (for which we use the shorthand, “low-top”).

[4] In this study, we attempt the first broad-scale assessment of the performance of the high-top and low-top ensembles in CMIP5 in simulating stratospheric climate and variability. Many previous assessments of the simulation of stratospheric climate by stratosphere-resolving models [Pawson et al., 2000; Cordero and Forster, 2006] and chemistry-climate models [Austin et al., 2003; Eyring et al., 2006; Butchart et al., 2011] have shown an improving simulation of stratospheric mean climate over time. However, several persistent biases remain in most models. One example is the cold bias in spring temperatures in the polar lower stratosphere, associated with a delay in the stratospheric final warming. This bias is present in both hemispheres and in high-top and low-top models. The key questions in this study are if biases in the lower and middle stratosphere are reduced in high-top models compared to low-top models and if low-top models exhibit additional stratospheric biases.

[5] Ultimately, for most climate modeling centers the value of enhancing the stratospheric representation in their models will be measured in terms of any improvement to tropospheric biases and variability and the simulation of tropospheric climate change. Several studies have shown that the pattern and magnitude of regional tropospheric climate change can be significantly affected by changes in the stratospheric climate [e.g., Sigmond and Scinocca, 2010; Scaife et al., 2011; Karpechko and Manzini, 2012]. In a companion paper, Manzini et al. [2012] examine how the multimodel simulation of tropospheric climate change in CMIP5 is sensitive to stratospheric climate change. A key prerequisite to this analysis is the diagnosis of the simulation of stratospheric climate and variability presented in this paper.

2 High-Top and Low-Top Models in the CMIP5 Ensemble

[6] The subset of models from the CMIP5 experiment considered in this study are listed in Table 1. In considering this large ensemble, it is clear that the CMIP5 models have a wide variety of lid heights, vertical resolutions, and parameterized physical processes in the stratosphere. The models were classified into a high-top and low-top ensemble based primarily upon their lid height, with a threshold between high-top and low-top at 1 hPa. This choice is motivated by previous studies [Cagnazzo and Manzini, 2009; Maycock et al., 2011; Shaw and Perlwitz, 2010], which have suggested that models with a top below the stratopause fail to properly simulate episodic stratospheric variability such as stratospheric sudden warmings. We might therefore expect a similar difference between models in the CMIP5 ensemble segregated by this threshold.

Table 1. Models Used in the Study and Their Stratospheric Properties. An N in the Stratospheric Physics Column Indicates Nonorographic Gravity Wave Drag is Included, a C Indicates Stratospheric Heterogeneous Chemistry is Included, a Line Indicates Neither is Included
ModelLid HeightLevelsAbove 200 hPaPhysicsEns.
BCC-CSM1.12.917 hPa2613Low
CCSM42.194067 hPa2713Low
CNRM-CM510 hPa319Low
CSIRO-mk3.6.04.5 hPa185Low
GFDL CM30.01 hPa4828NCHigh
GFDL-ESM2G3 hPa245Low
GFDL-ESM2M3 hPa245Low
GISS-E2-R0.1 hPa4019NCHigh
GISS-E2-H0.1 hPa4019NCHigh
HadCM310 hPa193Low
HadGEM2-ES40 km3815 Low
HadGEM2-CC85 km6037NHigh
INMCM410 hPa (0.01 sigma)218NLow
IPSL-CM5A-LR0.04 hPa3922NHigh
IPSL-CM5A-MR0.04 hPa3922NHigh
MIROC-ESM0.0036 hPa8063NHigh
MIROC-ESM-CHEM0.0036 hPa8063NCHigh
MIROC53 hPa4017Low
MPI-ESM-LR0.01 hPa4725NHigh
MRI-CGCM30.01 hPa4825NCHigh
NorESM1-M3.54 hPa2613Low

[7] Because this study is primarily concerned with assessing how the models reproduce past observed climatology, our focus is solely on the historical runs of the CMIP5 models, which have observed climate forcings (including forcing from greenhouse gasses, ozone depletion, land-use change, tropospheric and stratospheric aerosol, and solar variability). More details of the CMIP5 experimental design can be found in Taylor et al. [2012]. Typically, model simulations are compared with the MERRA [Rienecker et al., 2011] and ERA-Interim [Dee et al., 2011] reanalysis data sets over the modern satellite era (1979 to present) when confidence in stratospheric reanalysis is highest, but for some diagnostics that require longer climate records, the ERA-40 reanalysis data set is used as an additional or alternative point of reference. The methodology used for each diagnostic and the reanalysis data set used is described in each subsection.

[8] To put our results in context with other generations and types of GCMs, we compare some of the diagnostics with those from the CMIP3 [Meehl et al., 2007] and CCMVal-2 [SPARC CCMVal, 2010] ensembles. The CMIP3 models, which were developed in the early 2000s, generally have a lower vertical resolution than the CMIP5 models. The CCMVal-2 models are coupled chemistry-climate models, which are stratosphere resolving and have a vertical resolution comparable to the CMIP5 high-top ensemble, but are mostly run without coupling to the ocean. There is also a degree of commonality between the CCMVal-2 and CMIP5 models, because many share a similar or identical dynamical core.

[9] In section 3, we assess stratospheric climate by first calculating broad-scale metrics of the overall skill of the two model ensembles in simulating stratospheric climate and variability, following the work of Reichler and Kim [2008]. This approach allows us to directly and compactly assess the overall skill of the two ensembles and to compare them with the CMIP3 and CCMVal-2 ensembles. However, the broad-scale metric does not help us to understand the reasons for differences in the performance of the two CMIP5 ensembles. Hence, in section 4 we use a small selection of process-based diagnostics to illustrate why the model ensembles produce a similarly skillful simulation of mean climate but a very different simulation of stratospheric variability. In section 5, we then assess the impact of the lack of stratospheric variability in the low-top ensemble on the simulation of stratosphere-troposphere coupling and the reproduction of past stratospheric trends by the models.

[10] It is not possible to construct a consistent high-top and low-top ensemble for all of the diagnostics of stratospheric climate presented in this study and the supporting information. The coverage of different diagnostics in the CMIP5 archive is quite variable and hence we have chosen to construct the largest possible ensemble for each of the diagnostics available. The models and ensemble members used for each diagnostic are shown in Table 2. Although this approach is far from ideal, it does allow us to make a broad-scale assessment of the simulation of stratospheric climate by the high-top and low-top ensembles within the CMIP5 archive. In almost all cases, each ensemble is made up of a large number of different GCMs and the removal of individual models from the ensemble does not change the qualitative structure of the high-top or low-top ensemble mean or their difference where this is significant.

Table 2. Membership of High and Low-Top Ensembles for Each Sectiona
  1. aBecause all models do not provide all diagnostics to the archive, the high-top and low-top ensembles have varying composition in each section as shown here. Numbers indicate the number of ensemble members used in each case. Abbreviations used for each section are metric, a broad metric of stratospheric climate (Figure 1); Zon. Temp., Zonal Mean Temperature Bias (Figure 2); Trends, Trends in Lower Stratospheric Temperature (Figure 6); SSWs, Sudden Stratospheric Warmings (Figure 3); AM, Annular Modes (Figure 5); Volc., Response to Volcanic Eruptions (Figure 4).
High-top models      
GFDL CM315-115
Total models (EM)10(10)9(28)8(30)7(14)8(15)8(28)
Low-top models      
Total models (EM)9(9)9(28)6(22)9(27)6(11)7(40)

3 A Broad Metric of Stratospheric Climate

[11] A simple way to assess the performance of the CMIP5 models in the stratosphere and compare them to previous generations of models is to compute broad-scale climate metrics. In this section, model performance is evaluated in terms of zonal averages of temperature (T), zonal wind (U), and specific humidity (q). The analysis domain for the model metrics extends from 90°S to 90°N and from 100 to 10 hPa. Some models in the low-top ensemble may have artificial numerical momentum damping near the model top extending into this domain, but this information is not provided in the CMIP5 meta-data. This information is important to future evaluation of model performance in the stratosphere and we recommend that future multimodel intercomparison experiments include this type of meta-data.

[12] Four different aspects of climate are examined: long-term mean (MEAN), and variability on synoptic (DAILY), interannual (INTA), and decadal (DCDL) time scales. All four aspects are calculated individually for the four seasons. The calculation of mean climate as well as interannual and decadal variability is based on monthly mean input data. Synoptic variability is based on daily data after removing a low-frequency component with a temporal smoother using Gaussian weighting with a full-width at half maximum of 15 days [Baldwin et al., 2003].

[13] Variability is defined as the standard deviation over the given period of years. For example, interannual variability is the standard deviation of seasonal means, and decadal variability is the standard deviation of band-pass filtered monthly anomalies, using the fast Fourier transform technique and only retaining periods between 5 and 15 years. Decadal variability is calculated for the period 1961–2000, using the ERA-40 reanalysis as validation data. Mean climate, interannual variability, and daily variability are all based on 1979–2000 data and validated against ERA-Interim reanalysis. The model data are taken from the 20C3M (CMIP3), REF-B1 (CCMVal-2), and HISTORICAL (CMIP5) experiments; only one ensemble member is included from each model to avoid biases in the calculation of the metric toward models with larger ensembles.

[14] We examine two different measures of error: the pattern correlations (r) and the normalized root mean square error (E) [Reichler and Kim, 2008]. The procedure to compute E follows the method employed in Chapter 10 of SPARC CCMVal [2010]. In short, we first square the grid-point error between simulated and observed climate, normalize on a grid-point basis with the observed interannual variance, average spatially over a certain domain, and then take the square root. The grid-point error in simulating variability is based on the log2 variability ratio between model and observation. Because the computed values of E are nondimensionalized, errors from different climate quantities can be combined into a single measure of overall model performance. The values of r and E are calculated separately for each quantity, season, and model. We then take appropriate averages, e.g., for the four seasons and the three quantities (T, U, and q).

[15] Figure 1 summarizes the outcome of the validation exercise. Shown are the mean values of r and E for T, U, and q and for the four seasons. The oval shapes show the two standard deviation uncertainty intervals for the mean performance of the different model groups and aspects of climate, obtained by boot-strapping results from individual models within each ensemble. The clustering and location of the same colored ovals indicate that the simulation of synoptic variability (green) is generally associated with the highest skill scores. On the other hand, model performance is lowest for the simulation of decadal variability (light blue; note that this includes both forced and unforced variability), which may be in part related to the relatively short 40 year long data record and the uncertainty in observing this aspect of climate (particularly in the Southern Hemisphere (SH)).

Figure 1.

Simulation performance (90°S–90°N, 100–10 hPa) for different model ensembles and aspects of climate. Best performing ensembles are located at the lower left. Gray contours show the skill score S (in %), which combines E and r into a single index [SPARC CCMVal, 2010]. Oval shapes indicate ± 2 standard deviation uncertainty intervals, derived by bootstrapping results from individual models within a specific ensemble (estimates of the ensemble mean uncertainty were derived by resampling the existing estimates with replacement). C5H is the CMIP5 high-top model ensemble, C5L is the CMIP5 low-top model ensemble, CV2 is the CCMVal-2 model ensemble and C3 is the CMIP3 model ensemble. MEAN is the skill of the mean climate simulation, INTA is the skill of the internannual variability, DAILY is the skill of the daily variability and DCDL is the skill of the decadal variability.

[16] It is most notable that the high-top CMIP5 ensemble (thick solid line) simulates all three time scales of climate variability considerably better than the low-top CMIP5 counterpart (thin solid line). The uncertainty ovals for low-top and high-top CMIP5 models are well separated from each other. In other words, models with an increased vertical resolution and a higher model top presumably resolve stratospheric processes better, which leads to improved simulations of stratospheric climate variability. The mean climate (orange) of the high-top ensemble has slightly better correlation with the reanalysis than the low-top ensemble (i.e., it reproduces better the horizontal and vertical structure of the climate), but the root mean square error is comparable (i.e., the size of climate biases is broadly similar).

[17] Of the four model ensembles considered, the CMIP3 ensemble (thin dashed line) has the worst performance in simulating mean climate and interannual variability. When considering daily and decadal variability the CMIP3 and CMIP5 low-top ensembles have comparable performance, which is significantly worse than either the CMIP5 high-top ensemble or the CCMVal-2 ensemble.

[18] Comparing the two high-top ensembles (CMIP5-high and CCMVal-2 (thick dashed line)) one finds that CCMVal-2 simulates decadal and interannual climate variability significantly better, but that the simulation quality for daily variability and mean climate are essentially the same. However, when interpreting these results it is important to know that most CCMVal-2 models are run with observed SST forcing and that many CCMVal-2 models include the quasi-biennial oscillation (QBO), an important phenomenon of interannual variability in the tropical stratosphere. In most cases, however, the QBO simulation is due to “nudging” to observations and thus does not represent a true simulation. On the other hand, CMIP5 models do not use such nudging. Nudging the QBO mostly improves the simulation of the CCMVal-2 models over the tropics, whereas over the extratropics the CCMVal-2 and the CMIP5 models perform very similarly (not shown). It is important to note, of course, that most of the models in the CMIP5 ensemble are run with a prescribed stratospheric ozone field, potentially reducing intermodel spread in comparison to the CCMVal-2 models, which have their own internally generated ozone fields.

[19] This analysis clearly shows that there are significant differences in the simulation of climate variability in the lower stratosphere between the high-top and low-top ensembles. The aim of the remainder of the study is to analyze the stratosphere in the two model ensembles in more detail to discover the origin of these differences.

4 Possible Causes of the Differing Performance of the Two Ensembles

4.1 Mean Climate

[20] The similarity of the mean temperature biases in the two model ensembles can be shown by simply calculating the difference between the multimodel mean, zonal-mean, annual-mean temperature as a function of latitude and pressure, and the same quantity in the ERA-Interim reanalysis data set (Figure 2). In a separate procedure to that used for calculation of the model metrics above, each models’ climatology is determined for individual realizations, then averaged for all available ensemble members. The resulting temperature field is interpolated onto T42 latitudes and standard pressure levels, and averaged over all available models within the high-top and low-top ensembles. For both model ensembles there is model bias in the region of the tropopause, with warm biases near the tropical tropopause (around 100 hPa) and cold biases near the extratropical tropopause (near 250 hPa). These differences are consistent with a low bias in tropical tropopause heights and a high-bias in extra-tropical tropopause heights, with somewhat stronger biases in the low-top models (see Figures S1 and S5 in the supporting information for details). One difference between the two model ensembles is in the high-latitude middle stratosphere (10 to 30 hPa). In the low-top model ensemble there are large cold biases. This is consistent with previous generations of low-top models [e.g., Cordero and Forster, 2006]. In the Northern Hemisphere (NH) this is generally thought to be associated with a lack of episodic stratospheric dynamical variability driven by wave-mean flow interactions. It is explicitly shown in the next section that there is clear distinction between the high-top and low-top ensembles in CMIP5 in this regard. In the SH, where such episodic dynamical variability is weaker, differences in the indirect heating by the residual circulation driven by the gravity wave drag parameterization may play a role in the large cold bias in the low-top ensemble. We cannot explicitly address this issue because tendencies from the gravity wave parameterizations of the CMIP5 models are not available.

Figure 2.

Zonal mean annual mean temperature difference between the multimodel average temperature and the ERA-Interim reanalysis over the period 1979–2000 for (a) the high-top models and (b) the low-top models. Contour interval is 1 K and zero lines are shown in bold grey.

[21] More detailed analysis of the mean temperature and zonal wind biases shown in the supporting information (Figures S1 and S2) reveals that the model ensembles diverge most in the extratropical middle stratosphere in the seasons in which the stratosphere is dynamically active (DJF in the NH and JJA and SON in the SH). This suggests that differences in mean climate are largely associated with differences in dynamical variability between the two ensembles.

[22] There are also other potentially important differences between the mean climate of the high-top and low-top ensembles, which could provide interesting examples for further study. One point of difference is in the strength of the tropical upwelling. Although both ensembles reasonably capture the Brewer-Dobson circulation, the low-top models produce anomalously strong upwelling at the equator but with a narrower tropical pipe (Figure S3a). A second difference concerns the well-known and persistent late bias in the final warming of the polar vortex in both hemispheres. Although both the high-top and low-top ensembles exhibit this bias, the high-top ensemble does show some improvement compared to the low-top ensemble (Figure S4). Finally, all models underestimate the pole-to-equator contrast in tropopause height (Figure S5), but this bias does appear to be somewhat alleviated in the high-top models.

[23] In summary, aside from the region near the model top, the high-top and low-top ensembles have very similar mean climate biases.

4.2 Daily and Interannual Variability

[24] As shown in Figure 1 there are large differences in the skill of the simulation of stratospheric daily variability between the high-top and low-top ensembles. This section shows that these differences are the result of a significant lack of episodic stratospheric variability in the low-top models.

[25] Stratospheric sudden warming events (SSWs) are the most dramatic examples of wintertime, extratropical stratospheric variability and are often followed by large perturbations to the tropospheric flow [e.g., Baldwin and Dunkerton, 2001]. To properly represent stratospheric climate variability, a climate model should be expected to simulate stratospheric warmings at approximately the same frequency as long-term reanalysis data sets and with a similar climatological distribution. Charlton and Polvani [2007] showed that, on an event by event basis, SSWs contribute to the daily variability in the middle and lower stratosphere and the interannual variability in the lower stratosphere. We confirm at the end of this section that, generally, models with a larger frequency of major SSW events also have large daily and interannual zonal mean zonal wind variance in the middle stratosphere.

[26] There are many different ways in which the occurrence of SSW events in the stratosphere can be detected. In the present analysis, we use the algorithm of Charlton and Polvani [2007], based on measuring the number of times that the zonal mean zonal wind at 60°N and 10 hPa crosses zero during midwinter, which has been used to evaluate SSW occurrence in both reanalysis data sets and a large number of previous GCM studies. While other studies have suggested potential modifications to the algorithm or alternative algorithms, we retain the algorithm in its original form to allow ease of comparison with other studies. In addition, there is growing evidence that substantial decadal variability in SSW occurrence may exist [Schimanke et al., 2010]. Any analysis that seeks to characterize the SSW climatology of GCMs needs to take this into account. Therefore, in this analysis we use the period 1960–2005 for the models (the end of the historical simulation) so that a moderately large sample of SSW events in both models and reanalysis data sets can be considered. The calculated model SSW frequency is compared to the frequency from the ERA-40 reanalysis for the period 1958–2001. Previous studies of high-top models, both with and without coupled stratospheric chemistry [Charlton et al., 2007; Butchart et al., 2011] have highlighted the wide spread in the simulation of SSW frequency.

[27] Figure 3a shows the difference between the frequency of SSW events in the high-top and low-top ensembles by computing the mean decadal frequency of SSW events in each model, and provides an estimate of the 95% confidence interval for each frequency estimate. All the high-top models produce SSWs at a frequency consistent with the estimate from the ERA-40 climatology. On the other hand, almost all low-top models produce too few SSWs, with one model failing to produce any SSW events during the analyzed period. This difference is highlighted in the high-top and low-top ensemble averages, with high-top models on average producing more than double the frequency of SSW events as the low-top models.

Figure 3.

Daily and interannual variability in the models. (a) Climatological mean decadal frequency of stratospheric sudden warming events, 1960–2000 in 19 historical simulations of CMIP5 models. Colored bars show the number of SSW events per month calculated by the Charlton and Polvani algorithm, along with 95% confidence intervals for each estimate. Models shown in red are classified as high-top models, those shown in blue as low-top models. The climatological mean decadal frequency in the ERA-40 reanalysis data set is shown in the horizontal dashed black line and the 95% confidence interval for this estimate in gray. On the right of the plot, median estimates for the low-top and high-top ensembles are shown. (b) Ensemble mean total (full bar) and interannual (horizontal line crossing each bar) for the de-seasonalized zonal mean zonal wind at 60°N and 50 hPa during the period 1960–2000. The thick dashed line and thick dotted line show the total and interannual variance (respectively) for the ERA-40 reanalysis data set for the same period.

[28] It is also interesting to compare the simulations of SSWs in the two versions of the IPSL model, which differ only in their horizontal resolution. The medium-resolution (MR) version of the IPSL model produces SSWs with almost double the frequency of the low-resolution (LR) version although note the large standard error on the MR estimate and the overlap between the confidence interval for the LR and MR estimates). Studies with idealized models have pointed to the potential sensitivity of stratospheric dynamical variability to horizontal resolution [Scott et al., 2004], but to our knowledge this is the first evidence of that sensitivity in more comprehensive models. As a point of information, the horizontal resolution of the LR version is 1.875° × 3.75° and the MR version is 1.25° × 2.5°. Scott et al. [2004] found a significant underestimation of Rossby wave propagation and breaking for dynamical cores at resolutions coarser than T42 (approximately equivalent to a resolution of 2.8° × 2.8°), which is consistent with the fewer SSWs in the low-resolution compared to the medium-resolution IPSL model.

[29] Detailed analysis of the monthly climatology of SSW events (not shown) shows the wide range of behavior of GCMs in this diagnostic. Although the climatological distribution of SSW events is of course subject to significant sampling uncertainty, the results are robust enough to suggest that all models, but particularly low-top models, shift the majority of their SSW events toward the end of winter, and over-estimate the frequency of SSW events during March.

[30] To confirm that the frequency of major SSW events is a good indicator of the wintertime daily and interannual variability in the models, in Figure 3b we plot estimates of the total and interannual variance of the de-seasonalized zonal mean zonal wind at 50 hPa and 60°N for the same models. Although there is not a one-to-one relationship between SSW frequency and variance, it is clear that models with a higher frequency of SSW events also tend to have a larger total and interannual variance. In both measures, models in the high-top ensemble are closer to the value derived from the ERA-40 reanalysis.

[31] In summary, low-top models underestimate the frequency of major SSW events during midwinter and this is the main reason for the poorer skill of these models in simulating daily and interannual variability. As mentioned earlier, this is likely linked to the cold biases in the NH winter from 10–30 hPa in the low-top ensemble, in the sense that the cold bias and the limited number of SSW events are both likely related to the weak response of the stratosphere to tropospheric wave driving (see Table S3). Given the well-known link between SSWs and seasonal variations in tropospheric climate [e.g., Baldwin and Dunkerton, 2001], it is also possible that tropospheric seasonal climate variability will be different between the low-top and high-top ensemble. We begin to explore this idea in section 5.1.

4.3 Decadal Variability

[32] A significant source of stratospheric decadal variability over the historical period studied is that due to the large volcanic eruptions of El Chichón (in 1982) and Mt. Pinatubo (in 1991) and the relative paucity of eruptions since.

[33] To assess the ability of the current generation models to reproduce postvolcanic dynamical anomalies in the stratosphere, we compare here anomalies of lower stratospheric (50 hPa) zonal mean geopotential height from the CMIP5 models and ERA-interim. Posteruption geopotential height anomaly composites are then produced by averaging seasonal geopotential height anomalies after the 1982 El Chichón and 1991 Mt. Pinatubo eruptions. In the NH, the two winters following each eruption are averaged, while for the SH, since aerosol transport to the SH after El Chichón was quite weak, only the first spring after the El Chichón eruption is averaged together with the two springs following the Pinatubo eruption. Figure 4 shows the posteruption geopotential height anomalies for a number of CMIP5 historical simulation ensemble members (as detailed in Table 2) and from ERA-interim in SH spring (OND) and NH winter (DJF), seasons which have been identified as showing maximum posteruption circulation anomalies.

Figure 4.

Averaged geopotential height anomaly at 50 hPa for (right) two boreal winters following the El Chichón and Pinatubo eruptions in the historical simulations in the NH and for (left) one OND period following the El Chichón and two OND periods following the Pinatubo eruptions in the SH. Thin lines show individual models, thick lines show the ensemble average (except black line which shows anomaly in the ERA-Interim reanalysis). Blue lines show low-top models and red lines show high-top models. Black dashed lines show the two standard error estimate for the ERA-Interim data.

[34] Both high-top and low-top ensemble means show similar postvolcanic response, with a modest increase in 50 hPa geopotential height over most latitudes. No CMIP5 model simulation from either the low-top or high-top ensemble reproduces the strong volcanic response in the NH winter stratosphere, a result which is consistent with the recent analysis of Driscoll et al. [2012]. In SH spring, the high-top ensemble mean 50 hPa geopotential height anomaly has a slightly positive value in the high latitudes, consistent with the reanalysis, but the large ensemble spread indicates that the high-top model simulations, as an ensemble, are not significantly closer to reality than the low-top model results. The poor performance of both model ensembles in simulating decadal variability in the stratosphere (see Figure 1) is strongly tied to this deficiency but may also be related to the ability of the models to reproduce the smaller signal due to solar variability which has not been tested. The impact of the weak decadal variability in both ensembles on their simulation of past temperature trends is assessed in section 5.2.

5 The Impact of Reduced Stratospheric Variability

[35] It is clear from section 4.2 that the low-top CMIP5 model ensemble is deficient in its simulation of stratospheric variability on daily and interannual time scales. In this final section, we explore the impact of this deficiency on the simulation of coupling between the stratosphere and troposphere by the models and on the simulation of stratospheric trends. We choose to focus on these two areas, because they are relevant to many different applications of CMIP5 data and should be of interest to a wide range of fellow climate scientists.

5.1 Stratosphere-Troposphere Coupling

[36] We attempt to characterize stratosphere-troposphere coupling in the models by examining the annular modes. The annular modes characterize variability of the tropospheric midlatitude jets and the stratospheric polar vortex [e.g., Thompson and Wallace, 2000]. Following Baldwin and Dunkerton [2001] and Baldwin and Thompson [2009], we compute the annular mode index separately at each pressure level to explore the vertical coupling of the atmosphere. We use the procedure of Gerber et al. [2010], defining the Northern and Southern Annular Modes (NAM and SAM) as the first Empirical Orthogonal Functions of daily zonal mean geopotential height anomalies from each hemisphere. The height fields are first filtered (by removing a 30 year low-pass filtered version of the time series) to separate climate trends, so that the remaining anomalies reflect the natural variations of the atmosphere.

[37] Figure 5 shows composites of the NAM index based on extreme events in the stratosphere. Negative NAM events in the stratosphere are associated with a weak polar vortex, and closely (though not entirely) correspond to stratospheric sudden warmings [Charlton and Polvani, 2007]. Figure 5a, based on ERA-40 and ERA-Interim, is an update of Baldwin and Dunkerton [2001]; following a weakening of the stratospheric polar vortex, the NAM in the troposphere tends to shift toward a negative index, associated with an equatorward shift of the midlatitude jet stream.

Figure 5.

Downward propagation of negative NAM events in ECMWF reanalyses and CMIP5 models. Events are defined by instances when the NAM index at 10 hPa drops below –3 standard deviations, and the composites are constructed from NAM anomalies in the 90 days preceding (negative lags) and proceeding (positive lags) the dip at 10 hPa. The multimodel composites are based on all events in the CMIP5 historical simulations in either the high or low-top ensembles, but weighted so that each model makes an equal contribution to the composite.

[38] Figures 5b and 5c suggest a difference in the response of the high- and low-top ensembles to extreme stratospheric events. The high-top ensemble composite (Figure 5b) is quite similar to the reanalyses; the NAM throughout the atmosphere shifts toward a negative index following an event at 10 hPa, with a slight lag in the troposphere, and persists for approximately 60 days in the lower stratosphere and troposphere. In the low-top ensemble, the initial response to stratospheric perturbations is similar, but with a slightly stronger response in the troposphere. The events, however, fail to persist as long, particularly in the troposphere. Composites based on extreme positive NAM events, when the polar vortex becomes very strong, reveal a similar bias in the low-top ensemble; the magnitude of the tropospheric response is correct, but short lived (not shown).

[39] Further analysis suggests that the differences in the models’ tropospheric response is due to a difference in the temporal variability of the lower stratosphere. Following Baldwin et al. [2003], the e-folding time scale of the NAM was computed as a function of height. The time scales of variability in the troposphere are fairly similar in the reanalysis and both model ensembles (all about 10–15 days). In the lower stratosphere, however, time scales were reduced in the low-top models, peaking at approximately 20 days, compared to approximate 30 days in the reanalyses and 35 days in the high-top ensemble. Both model ensembles show a bias in the timing of the seasonal peak, which occurs around January in the reanalyses, compared to March in the models. This bias in the timing is consistent with CCMVal2 and CMIP3 models [Gerber et al., 2010]. The persistence of anomalies in the lower stratosphere appears critical to sustaining the tropospheric response.

[40] It is important to point out that extreme NAM variability is defined relative to the background variability in each model separately. As the variance of the middle stratosphere extratropics is smaller in the low-top models than in the high-top models (not shown), their NAM composites are based on smaller absolute changes to the geopotential height field. Nonetheless, the diagnostic highlights potential links between biases in natural stratospheric variability and the dynamical coupling between the stratosphere and troposphere in the low-top models.

5.2 Reproduction of Past Stratospheric Trends

[41] Because the CMIP5 models will be used extensively to understand and predict future climate, a key test of their fidelity in the stratosphere is their ability to reproduce past stratospheric trends. In this section, we test if the missing stratospheric variability in the low-top models has any impact on their ability to reproduce past trends in the lower stratosphere. We focus here on the satellite period for which coverage is almost globally-complete, and compare observed remote sensing system microwave sounding unit (MSU) temperature of the lower stratosphere (TLS) [Mears and Wentz, 2009] (a layer temperature with a weighting function maximum located at close to 80 hPa), with simulated temperatures from CMIP5 models. We compare with an MSU observational data set, because stratospheric trends are not robust in reanalyses [Xu and Powell, 2011], and radiosonde coverage is spatially incomplete. We focus on the lower stratosphere, because several models considered do not resolve the region covered by the stratospheric sounding units. Each historical simulation to 2005 was concatenated with either the corresponding extended historical simulation where available, or with the corresponding RCP 4.5 simulation to compare with observations up to 2011 and to improve the signal-to-noise ratio. In both cases, radiative forcings are continuous through the simulations and close to those observed over the past 6 years [Gillett et al., 2012]. A fixed weighting function with a maximum near 80 hPa [Mears and Wentz, 2009] was applied to zonal mean monthly mean data. Each individual simulation was given equal weight in the analysis, meaning that models with larger ensembles were given more weight (see Table 2). IPSL-CM5A-LR, IPSL-CM5A-MR, and INMCM4 were excluded from the analysis because they did not include volcanic aerosols, and CNRM-CM5 was excluded because its simulated stratospheric ozone changes were underestimated compared to those observed.

[42] Figure 6a shows the generally good agreement between simulated and observed TLS in both high-top and low-top simulations. Following Pinatubo in 1992, the positive temperature anomaly is overestimated in both sets of simulations. Given that the models tend to underestimate the 50 hPa geopotential height anomalies associated with volcanic eruptions (Figure 4), it appears that models may have some difficulty capturing the structure of the warming or other feedbacks with the circulation. The overestimation of the temperature response is larger for the high-top simulations and is explained mainly by an overestimation of the response to Pinatubo in MRI-CGCM3, even though its response to El Chichón is relatively consistent with observations. MRI-CGCM3 is the only model considered here that interactively simulated the stratospheric aerosol distribution and its radiative properties following volcanic eruptions, unlike the other models in which the aerosol distribution and properties are prescribed from observations [Driscoll et al., 2012]. Our results suggest that this model's stratospheric aerosol scheme leads to an overestimation of the stratospheric temperature response to Pinatubo. A further apparent difference between the observations and simulations is in the first 3 years (1979–1981), in which observed TLS temperatures are higher than those in either the high-top or low-top ensembles. Overall, agreement between simulations and observations appears better for the CMIP5 simulations than for the CMIP3 simulations [Cordero and Forster, 2006], with the volcanic response less overestimated than in CMIP3, and perhaps also because of the more realistic ozone changes specified in the CMIP5 simulations. Agreement with observations also appears better for the CMIP5 models than for the CCMVal models [Forster et al., 2011; Austin et al., 2009], perhaps because the former mostly use an observational data set of stratospheric ozone changes usually derived from Cionni et al. [2011], rather than the internally simulated ozone concentration of the CCMVal simulations. It is important to note, however, that the observed decreases in ozone concentrations following major volcanic eruptions are not included in the forcing data set, and hence the impact of stratospheric aerosol changes on the lower-stratospheric temperature trends may be slightly amplified.

Figure 6.

Comparison of simulated and observed lower stratospheric temperatures. (a) Global mean annual mean lower stratospheric temperature anomalies from the 1979–2011 mean, based on the remote sensing solutions data set (black), the mean of the high-top models (red), and the mean of the low-top models (blue). Global means are calculated over the observed latitudes 82.5°S–82.5°N. Pink and blue shaded bands show the approximate 5–95% ensemble range. (b) Corresponding zonal mean temperature trends in °C over the 1979–2011 period. Volcano years (1982, 1983, 1991, 1992, and 1993) were excluded from the analysis, and linear trends were fitted by least squares.

[43] Figure 6b compares the pattern of zonal mean temperature trends in simulations and observations, with the 2 years following El Chichón and Pinatubo excluded. Cooling is seen at all latitudes in observations and high-top and low-top ensemble means. The level of agreement with observations is generally good, and there is no significant difference between the level of agreement with observations of the high-top and low-top simulations. Whereas observed trends calculated over the 1979–2005 period show little latitudinal variation [Gillett et al., 2011], in part due to anomalous warmth over the Antarctic in 2002 [Varotsos, 2002; Allen et al., 2003], the trend calculated over the 1979–2011 period shows enhanced Antarctic cooling, consistent with the simulations. This is associated with the years since 2005 all being colder than average over the Antarctic, with 2011 particularly cold [Seidel et al., 2011].

[44] Overall, we find that the CMIP5 models considered are able to simulate observed temperature changes in the lower stratosphere rather well. The CMIP5 models appear to perform better in this respect than either the CMIP3 models or the CCMVal models, which may relate to the realistically prescribed stratospheric ozone variations. Overall, although there are few significant differences between the temperature trends simulated by the low-top and high-top models the simulation of stratospheric trends by the low-top models seems unaffected by their reduced stratospheric variability.

6 Discussion and Conclusions

[45] In this study, the simulation of stratospheric climate and variability by two subensembles of the full CMIP5 ensemble, one containing models with a high-top and one containing models with a low-top, is compared. Away from the model top, the simulation of mean stratospheric climate by the two ensembles is similar and better than in the CMIP3 ensemble. However, the low-top ensemble has a poorer simulation of stratospheric daily and interannual variability than the high-top ensemble. In the Northern Hemisphere, this difference is due to a lack of episodic stratospheric variability in midwinter in the low-top models.

[46] The lack of dynamical variability in the low-top models has an impact on stratosphere-troposphere coupling, resulting in relatively short-lived anomalies in the Northern Annular Mode, which do not persist in either the stratosphere or troposphere. In this respect, the high-top models more faithfully reproduce the stratosphere-troposphere coupling seen in the reanalysis data. Conversely however, the lack of daily and interannual stratospheric variability in the low-top models does not seem to affect their ability to reproduce historical trends in lower stratospheric temperature. Both the high-top and low-top ensembles are able to reproduce global stratospheric trends with good fidelity.

[47] Of the three time scales of variability examined, decadal variability is the least well represented by CMIP5 models compared to observations, with little difference between the high-top and low-top ensembles. Examining the stratospheric response to volcanic eruptions, a significant source of forced decadal variability, we find realistic global mean temperature responses in the models, but a lack of dynamical response as seen in observations.

[48] Few of the models in CMIP5 place their model top at 10 hPa, thereby severely truncating stratospheric dynamical behavior, but it is clear from our analysis that making a modeling choice to place the model top below the stratopause still has the potential to severely limit stratospheric dynamical variability. Simulating realistic stratospheric dynamical variability remains a challenge for all climate models, even those with high-tops. For example, work with the suite of Canadian climate models has shown that the simulation of stratospheric dynamical variability is very sensitive to small changes in the stratospheric mean state and it is therefore very important to continue work to understand biases in the stratospheric climate or climate models and seek to improve them [see for example, Sigmond and Scinocca, 2010]. The single model studies by Hardiman et al. [2012] and Shaw and Perlwitz [2010] come to a similar conclusion that the current generation of low-top and high-top models have a very similar simulation of stratospheric mean climate and trends but differ in their simulation of stratospheric variability.


[49] We acknowledge the World Climate Research Programme's Working Group on Coupled Modelling, which is responsible for CMIP, and we thank the climate modeling groups for producing and making available the model output listed in Table 1. For CMIP the U.S. Department of Energy's Program for Climate Model Diagnosis and Intercomparison provides coordinating support and led development of software infrastructure in partnership with the Global Organization for Earth System Science Portals. A.J.C.-P. and L.J.W. were supported by an National Centre for Atmospheric Science CMIP5 grant. M.P.B. was funded by NSF under the US CLIVAR program and the Office of Polar Programs. T.B. and N.A.D. acknowledge support by the U.S. National Science Foundation. The research efforts of R.X.B., B.A.M. & Y.-Y.L. were conducted under support by the U.S. Department of Energy, Office of Biological and Environmental Research, Award No. DE-FOA-000024 and by the National Science Foundation Grant, ARC-1107384. N.C. was supported by the Spanish Ministry of Science and Innovation (MCINN) through the CGL2008-05968-C02-01 project. E.P.G. was supported by the National Science Foundation. The work of M.T. and K.K. contributes to the BMBF joint research project MiKlip within the project ALARM through the grant 01LP1130B. The work of S.C.H. was supported by the Joint DECC/Defra Met Office Hadley Centre Climate Programme (GA01101). We also acknowledge the European Commission's 7th Framework Programme, under Grant Agreement number 226520, COMBINE project which supplied some data not available from the CMIP5 archive. We also thank Gerard Devine (NCAS-CMS) for help with accessing and parsing meta-data information from the models.