Atmospheric, Oceanic and Planetary Physics, Department of Physics, University of Oxford, Oxford, UK
Corresponding author: S. Driscoll, Atmospheric, Oceanic and Planetary Physics, Department of Physics, University of Oxford, Clarendon Laboratory, Parks Road, Oxford OX1 3PU, UK. (firstname.lastname@example.org)
 The ability of the climate models submitted to the Coupled Model Intercomparison Project 5 (CMIP5) database to simulate the Northern Hemisphere winter climate following a large tropical volcanic eruption is assessed. When sulfate aerosols are produced by volcanic injections into the tropical stratosphere and spread by the stratospheric circulation, it not only causes globally averaged tropospheric cooling but also a localized heating in the lower stratosphere, which can cause major dynamical feedbacks. Observations show a lower stratospheric and surface response during the following one or two Northern Hemisphere (NH) winters, that resembles the positive phase of the North Atlantic Oscillation (NAO). Simulations from 13 CMIP5 models that represent tropical eruptions in the 19th and 20th century are examined, focusing on the large-scale regional impacts associated with the large-scale circulation during the NH winter season. The models generally fail to capture the NH dynamical response following eruptions. They do not sufficiently simulate the observed post-volcanic strengthened NH polar vortex, positive NAO, or NH Eurasian warming pattern, and they tend to overestimate the cooling in the tropical troposphere. The findings are confirmed by a superposed epoch analysis of the NAO index for each model. The study confirms previous similar evaluations and raises concern for the ability of current climate models to simulate the response of a major mode of global circulation variability to external forcings. This is also of concern for the accuracy of geoengineering modeling studies that assess the atmospheric response to stratosphere-injected particles.
 For a volcano to have a significant long-term impact on the climate it must inject a sufficient amount of sulfur containing gases into the stratosphere [Robock, 2000]. Once in the stratosphere the sulfate gas undergoes a chemical reaction to produce sulfate aerosol. The e-folding time of the sulfate gas to particle conversion is typically 30–40 days [Forster et al., 2007]. Sulfate aerosol scatters back to space the incoming shortwave radiation (SW) and also absorbs solar near infrared (NIR) radiation and upwelling long wave (LW) radiation from the surface and atmosphere below [Stenchikov et al., 1998; Ramachandran et al., 2000; Andronova et al., 1999]. For a given mass load, the scattering of SW radiation is modulated by the particle size distribution and as the aerosol particle size increases, scattering of incoming SW radiation decreases [Timmreck et al., 2009; Rasch et al., 2008]. The decrease in incoming shortwave radiation results in a cooling of Earth's surface [Robock and Mao, 1995]. The typical e-folding lifetime for tropically injected volcanic aerosols is about 12–14 months [Lambert et al., 1993; Baran and Foot, 1994; Barnes and Hofmann, 1997], causing surface cooling for about two years following an eruption.
 In contrast, localized equatorial heating, around 3 K for the Pinatubo eruption of June 1991 [Stenchikov et al., 2002], occurs in the lower stratosphere due to the increase in absorption of NIR and LW radiation by the sulfate aerosols. For a tropical volcanic eruption the heating in the tropical stratosphere creates anomalous temperature and density gradients between the equator and poles. By the thermal wind relationship, this causes a strengthening of the zonal winds, which results in a strengthened stratospheric polar vortex. In addition, reduced surface temperatures in the tropical regions reduce the meridional surface temperature gradient, and this has been associated with a reduction in the Eliassen Palm (EP) Flux - essentially, a measure of planetary wave activity from the troposphere into the stratosphere [Andrews et al., 1987] - and hence a stronger, less disturbed vortex. Further, chemical reactions which result in ozone depletion serve to cool and strengthen the vortex, and the reduced temperatures cause more NH ozone depletion, creating a positive feedback loop [Stenchikov et al., 2002].
 A substantial body of research has indicated an influence of the stratospheric vortex on high latitude circulations at Earth's surface, with a strengthened vortex associated with a positive North Atlantic Oscillation/Arctic Oscillation [Baldwin and Dunkerton, 1999, 2001; Thompson et al., 2002; Black, 2002; Kolstad and Charlton-Perez, 2010; D. Mitchell et al., The influence of stratospheric vortex displacements and splits on surface climate, submitted to Journal of Climate, 2012]. The North Atlantic Oscillation (NAO) is an index corresponding to the difference in mean sea level pressure (MSLP) between the Azores and Iceland [Rodwell et al., 1999; Hurrell and Deser, 2009], and the Arctic Oscillation (AO) is defined as the first hemispheric empirical orthogonal function (EOF) of sea level pressure variability [Thompson and Wallace, 1998; Stenchikov et al., 2002]. Essentially the NAO can be thought of as the AO over the Atlantic region [Christiansen, 2008]. A positive AO corresponds to anomalously low pressure over the pole, and anomalously high pressure at midlatitudes, and vice versa for the negative AO. After large volcanic eruptions a positive phase of the AO has been observed for the following 1 to 2 winters [Robock and Mao, 1992; Stenchikov et al., 2002]. The associated stronger westerly winds cause anomalous advection of warm oceanic air overland, and this results in anomalously warm temperatures over major NH landmasses [Thompson and Wallace, 2001]. Thus, as a result of the combined result of the surface cooling and lower stratospheric tropical heating, a dynamical feedback occurs during NH winter, which results in surface warming over Northern America, Northern Europe and Russia [Robock and Mao, 1992]. Negative surface temperature anomalies in the Middle East are also a distinctive feature of post-volcanic winters consistent with the positive phase of the AO [Stenchikov et al. 2006] (S06 hereafter).
 Climate model simulations of the historical period have, so far, been able to produce a slightly strengthened stratospheric vortex, but much weaker than the observations, and have failed to reproduce a positive AO and warming/cooling patterns over Eurasia and the Middle East respectively for the two NH winters following volcanic eruptions (S06). S06 analyzed seven models used for the Fourth Assessment Report of the Intergovernmental Panel on Climate Change [Intergovernmental Panel on Climate Change, 2007]. They included all the models that specifically represented volcanic eruptions by including a layer of aerosol, and excluded those that either did not represent them, or represented them simply by a reduction in the solar constant. They found that the temperature increase in the lower equatorial stratosphere, caused by radiative heating from the aerosol, was reproduced by all the models. However, the models showed less agreement with the observed post eruption NH winter polar lower stratospheric cooling. Further, the 50 hPa polar geopotential height (indicative of the strength of the stratospheric polar vortex) in the models generally showed almost no change whereas the observations show a large negative anomaly of about 200 m, revealing a statistically significant stronger than average polar vortex at the 90% level. Furthermore, the AO responses in the model simulations were significantly weaker than in observations, indeed, Otterå  notes that some model simulations show no AO response. Correspondingly the strength and spatial pattern of the surface temperature anomalies were not well reproduced.
 Since the previous analysis of S06, who used simulations from the World Climate Research Programme's (WCRP's) Coupled Model Intercomparison Project phase 3 (CMIP3) multimodel data set [Meehl et al., 2007], climate models have undergone changes and improvements, and spatial and vertical resolutions have been increased. In this study, we repeat the analysis of S06 using model simulations from the Coupled Model Intercomparison Project phase 5, (CMIP5) [Taylor et al., 2011] and focus our analysis on the impact of the largest volcanic eruptions on the NH winter circulation. The models and experiments are described in section 2, results are presented in section 3, and in section 4 we present our discussion and conclusions.
2. Models and Experiments
 The model runs analyzed in this study come from the historic simulations of the climate of the 20th century as standardized for the CMIP5. Models were forced with natural and anthropogenic forcings from the late 19th century to the early 2000s. Although the major external forcings (such as solar, greenhouse gases, land use) are standardized based on the most recent observational databases, no specific recommendations were issued for other forcings such as the stratospheric injection of sulfate aerosols from explosive volcanic eruptions. As for the CMIP3, most modeling groups imposed the stratospheric emissions for volcanic eruptions either from the reconstructions of Ammann et al.  (AM), its update Ammann et al.  (AM07), or from the updated version of Sato et al.  (ST, updates available at data.giss.nasa.gov/modelforce/strataer). The AM data set provides monthly latitudinal distributions of stratospheric optical depth for each volcanic event in 64 latitude bands, computed with an explicit representation of the spread of the aerosol cloud, taking into account the seasonal variations in stratospheric transport. A fixed particle size distribution is assumed for all eruptions, with spherical droplets of sulfuric acid of effective radius of 0.42 μm. AM, however, only extends back to 1890. An updated data set AM07 provides data well before the start of the historical simulations (1850) and many modeling groups use either AM07 or combine AM with ST to overcome this problem, as we detail for individual models in Table 1.
Table 1. Models Used in the Study and Their Basic Characteristics, Imposed Aerosol Forcing and Number of Ensemble Members Availablea
Number of Ensemble Members
Basic characteristics are horizontal resolution, vertical levels and model top. In the last column are listed the variables analyzed for each model (T = 1.5 m temperature, P = mean sea level pressure, Z = geopotential height).
Wu et al. [2008, also The 20th century global carbon cycle from the Beijing Climate Center Climate System Model (BCCCSM), submitted to Journal of Climate, 2012].
 The ST data set provides monthly latitudinal zonal mean stratospheric optical depths for 24 layers between 15 km and 35 km together with variations of the particle's effective radius based on the observations of the 1991 Mt. Pinatubo and 1982 El Chichón eruptions. In GFDL-CM3 model the optical characteristics were calculated followingStenchikov et al.  using the optical depths from ST data set and its updates.
 Unlike the other models, MRI-CGCM3 interactively computes the conversion from SO2amount to stratospheric aerosol. It includes the aerosol model MASINGAR mk-2 [Tanaka et al., 2003], which calculates five species (sulfate, black carbon, organic carbon, mineral dust, and sea-salt) of aerosols from emissions and other processes, including sulfate aerosol of volcanic origin. The aerosol model is interactively coupled with the atmospheric component that calculates radiation and cloud microphysics and utilizes the inventory of volcanic SO2 emissions provided by Stothers , Bluth et al. , Andres and Kasgnoc , Stothers  and the optical properties of spherical sulfate aerosol droplets provided by OPAC (Optical Properties of Aerosol and Clouds) [Hess et al., 1998].
 We restricted model analysis to those models that were both forced with volcanic aerosol in the stratosphere and had at least 2 ensemble members, which yielded a total of 13 different climate models. The models with a brief description of the basic characteristics are listed in Table 1. Three models, GISS-E2-R, CCSM4 and GFDL-CM3, in their updated version, are common to both our analysis and that ofS06.
 For comparison with observations the reanalysis of the 20th Century version 2 (20CRv2) [Compo et al., 2011] is employed. From this data set we will use only near-surface temperature and Mean Sea level Pressure (MSLP) fields for the period of 1871 to 2008. Our results compare similarly across a number of observational reconstructions such as HadCRUT2v and HadSLP1 (used inS06), and so the choice of product does not alter our conclusions. More information about the database is provided at http://www.esrl.noaa.gov/psd/. The ERA40 [Uppala et al., 2005] and NCEP/NCAR [Kistler et al., 2001] reanalysis fields are also used to compare with middle atmosphere circulation changes during the winter season for the largest eruptions after 1950.
 To isolate the anomalies of the post-volcanic seasons and generate the average volcanic composite, we adopt the same averaging procedure employed byS06, choosing a different reference time for each eruption and averaging two winter seasons after each eruption. The statistical significance of anomalies from the mean climatology is evaluated with a local two-tailed t-test. We also compute the multimodel mean of the post-volcanic anomalies averaging with equal weight the ensemble mean of each model. All model have been interpolated to a common 2.5°Lat × 3.75°Lon grid.
 Using a large number of eruptions and minimum of two ensemble members per model (lending an equal weight to each ensemble member in the computation) should help to average out spurious effects, for example due to incorrect sampling of the El Niño Southern Oscillation (ENSO) cycles, which cannot be controlled in these coupled ocean atmosphere simulations. However, we also calculate the 3.4 ENSO index for each model (Table 3) by computing the area averaged total SST from the Niño 3.4 region, computing the monthly climatology (1950–1979) for area averaged total SST from the Niño 3.4 region, and subtracting the climatology from the area averaged total SST time series to obtain anomalies. These anomalies are then smoothed with a five-month running mean, and then normalized by the standard deviation over the climatological period (1950–1979).
U50hPa is the winter (DJF) seasonal climatological zonal wind computed for two regions, 30°S–30°N and 55°N–65°N. In bracket is the standard deviation. The last column shows the ENSO 3.4 index (see text). In the last row the climatological wind from ERA40 and the ENSO 3.4 index from 20CRv2 based on HadISST.
Christiansen  showed through analysis of observations that the largest volcanic eruptions of the 20th Century tend to be followed by a positive index of the North Atlantic Oscillation (NAO). He noted that the NAO signal is strongest and significant in the first year after the eruption and does not appear to be influenced by ENSO events or by the specific volcanic eruption chosen for the composite.
 We computed the NAO index for each model and each ensemble member to test whether the simulated dynamical response to volcanic forcing projects onto the NAO index as observed by Christiansen  in the observations. The NAO index is computed for each ensemble member of each model, as in Christiansen . We first compute the Empirical Orthogonal Functions (EOFs) of the monthly winter (DJF) MSLP anomalies north of 20°N and between 110°W and 70°E for the period 1948–2000. Each pressure data point is weighted by the square root of the grid area it represents, consistent with Christiansen . The seasonal winter (DJF) NAO index is computed from the monthly indices, defined as the principal component of the monthly anomalies of the MSLP projected onto the first EOF for the total period 1860–2000 and normalized to unit variance. The same index is computed for the 20CRv2 MSLP data. The EOF pattern for each model is shown in Figure 1.
 We compare models and reanalysis using a superposed epoch analysis of the winter NAO (DJF) for the nine volcanic eruptions listed in Table 2. We take the winters in the neighboring ten years close to the first winter after each eruption (five years before and five years after) as defined in Table 2 and generate an “eruption matrix” whose rows represent each eruption event. The eruptions in each ensemble member are considered to be independent events, hence the number of rows in the “eruption matrix” is different for each model because it depends on the number of ensemble members. The rows are then averaged to obtain the epoch composite of 11 years, from winter in year −5 to winter in year +5 with year 0 the first winter after an eruption.
 The statistical significance of the epoch analysis is estimated using the bootstrap method [Efron and Tibshirani, 1986]. We reshuffle with replacement the elements of each row to generate a new “random eruption matrix” and average the rows into a new epoch composite. The procedure is repeated 5,000 times obtaining a distribution of NAO values for each lag of the epoch composite. The random composites are drawn from the original epoch matrix to preserve the structure of the sample. We also adopted the normalization procedure described in Adams et al.  to avoid possible biases due to single outliers in each volcanic window, but the main conclusions are not affected by the normalization. We compare the level of the NAO index for each year of the composite with the 5%–95% and 1%–99% percentile levels of the bootstrap distribution.
 We also tested for the occurrence of positive NAO for both in the first and second post-volcanic winter and its significance is tested using a bi-nomial distribution with the probability of the single event (σ) estimated from the full time series. As noted in Christiansen , σ is in general different from 0.5 which is due to the probability distribution of the NAO index not being normal. σ for each model is reported in Figures 10 and 11.
 The main conclusions are robust with respect to the definition of the winter season (DJF or DJFM) and we will present here the results for the NAO index computed for the DJF composite to allow comparison with previous results in the literature.
3.1. Direct Radiative Effect of Volcanic Aerosol
 Due to a lack of direct information on the radiative forcing of volcanic aerosol for each model, we choose to use the time series of the anomalies in the reflected short wave (SW) radiation at the top of atmosphere (TOA) (Figure 2) as a rough proxy for the global radiative effect of the stratospheric aerosol, as in S06 (their Figure 1). All the models perform consistently with each other and show the increase in the reflected SW radiation corresponding to the major explosive eruptions and do not show any appreciable differences compared with the CMIP3 models shown in S06. The largest anomaly in the reflected SW radiation is observed for the bcc-csm1-1 model whereas MRI-CGCM3 simulates the lowest signal among the models. MRI-CGCM3 computes interactively the effect of the volcanic aerosol from the stratospheric SO2load and shows a lower scattering efficiency of incoming SW radiation with respect to the other models, even in the satellite-constrained era. This is possibly due to the interactive chemistry conversion processes affecting the properties of the aerosol created from the SO2 in the lower stratosphere. Large differences between this model and other all other models, forced by imposed changes in lower stratospheric optical depths, raises questions about the realism of the MRI model with regards to the TOA anomalies.
 As noted in S06, larger spread among the model response is observed for the early eruptions and less uncertainty appears for the most recent El Chichón and Pinatubo events. Notably, the largest effect on the reflected SW radiation for the eruptions pre-1900 is observed in the models that adopt the AM reconstruction.
 As a measure of the anomalous heating forced by the volcanic aerosol in the lower stratosphere, we analyzed the anomalies in the de-trended 30°S–30°N, 50 hPa temperature.Figure 3shows that the models simulate an increase in the lower stratospheric temperature of about 2 K, up to 4 K for the largest eruptions of Pinatubo and Krakatau. The largest temperature anomalies are simulated by the models using the AM database, with heating for the Pinatubo eruption up to 10 K forCCSM4 and 7 K for NorESM1-M. MRI-CGCM3 shows anomalies close to the multimodel mean and generally larger than observed for the models using theSato et al.  database, but places the peak of the warming associated to the eruption of Agung about one year later than the other models.
 The multimodel mean appears in good agreement with the temperature anomalies from the ERA40 reanalysis for the eruptions after 1960. The overestimation of the warming associated to Pinatubo is likely in part due to the cooling effect of the easterly phase of the QBO in the winter 1991–1992 [Ramachandran et al., 2000; Stenchikov et al., 2004], not accounted for in the CMIP5 models.
3.2. Surface Temperature and Mean Sea Level Pressure
Figure 4shows the NH composites of surface temperature, mean sea level pressure (MSLP) and geopotential heights for the observations and the multimodel mean. We first focus on the surface temperature and MSLP for the post-volcanic winter season (as given in the fourth column inTable 2). Figure 4a shows in the reanalysis the well known significant surface warming signal over northern Europe and Asia, where anomalies reach up to 2 K. Significant cooling is observed over NE America and also, though not significant, over the Middle East. As noted in S06, a warming signal also appears on the Eastern Pacific but this could be spurious due to a positive ENSO sampling bias. A general cooling is observed in the Tropical region, although weak and barely significant. The reanalysis surface temperature anomaly in the Arctic region appears unusually warm, but the reliability of the reconstructed lower tropospheric temperature at high latitudes reduces the significance of the anomaly [Compo et al., 2011].
 The observed surface temperature anomalies in the NH post-volcanic winters are closely related to changes in the winter circulation as confirmed by the MSLP anomalies (Figure 4c). In agreement with previous studies (e.g., S06), in the reanalysis a significant positive NAO-like pattern marks the North-Atlantic region, with negative pressure anomalies in the Arctic region and positive over the North-Atlantic. Notice that the minimum and maximum of the anomaly are both displaced northward with respect to the pattern of the leading mode of variability in the MSLP anomalies in the region as observed inFigure 1 for the 20CRv2.
 The multimodel aggregate of surface temperature and MSLP shows no such pattern (Figures 4b and 4d). A general cooling is observed in the surface temperature anomaly field, however no dynamical response to a large tropical volcanic eruption can be seen in the multimodel aggregate. Figures 5a and 5b reveal large areas of significantly different temperature and MSLP between the observations and models, especially over areas associated with the positive NAO and DJF surface warming.
Figures 6 and 7show the NH composites of surface temperature and MSLP for the post-volcanic winter season in the individual models. Large variability is observed between the models in their NH response: the observed warming in the northern Eurasia is simulated by a few models but is much weaker than in the observations. For example, GISS-E2-H and GISS-E2-R simulate the northern European warming pattern reasonably well but the maximum amplitude is only 0.5 K. The cooling over NE Canada seems to be simulated more widely, independent of how well the northern Eurasian warming is captured. Some models (CSIRO-Mk3.6, HadGEM2-ES, NorESM1) simulate a general cooling in the Asian-European area, opposite to the observations, and the majority show a significant cooling in the tropical lower latitudes, of around 0.2 K over the oceans.
 Large inter-model differences in MSLP pattern are shown inFigure 7. Only CNRM-CM5 and CanESM2 reproduce a weak dipole over the North-Atlantic, whereas NorESM1 shows anomalies opposite to those observed. The other models only show weak anomalies with minimal statistical significance. The two GISS models simulate weak surface temperature anomalies but do not show any significant anomaly in the MSLP. The GISS-E2-R model differs from GISS-E2-H in that its response is weaker, and not statistically significant. The only difference between the GISS-E2-H and GISS-E2-R models is the ocean model to which the atmosphere is coupled. GISS-E2-R uses the ModelE atmospheric code and is coupled to the Russell ocean model (1° × 1.25° L32), while GISS-E2-H uses the same ModelE atmospheric code but is coupled to the Hycom ocean model (1° × 1.25° L26) [Schmidt et al., 2006]. In a modeling study on the effects of volcanic eruptions on the oceans [Stenchikov et al., 2009] reported changes in sea level, temperature, ocean heat content, salinity, and also significant strengthening of the Atlantic Meridional Overturning Circulation (AMOC) 40–60°N in the first few years following an eruption. While it is therefore possible that part of the surface response could be due to changes in NH ocean circulation, it is generally believed for AMOC changes, in particular, to be caused by the changes in wind stress due to positive NAO [Delworth and Dixon, 2000] that is a result of a stronger vortex following volcanic eruption [Stenchikov et al., 2009], not that the ocean affects the surface to cause a positive DJF warming for up to two years following avolcanic eruption. Therefore it is unlikely that the response witnessed in GISS-E2-H which differs slightly to GISS-E2-R, particularly with no strong positive NAO, is due to an activation of the volcanic mechanism.
 The analysis of surface temperature and MSLP in the CMIP5 ensemble shows a poor correspondence with observations during the first two NH winters following large tropical eruptions. No improvement is seen with respect to the findings of S06 based on a selection of seven models participating in CMIP3.
3.3. Geopotential Height
 Geopotential height anomalies in the upper troposphere and mid stratosphere help define circulation changes during winters following large volcanic eruptions. Due to the high uncertainty in the 20CRv2 reconstructions of upper air fields [Compo et al., 2011], we decide to analyze only the last four eruptions since 1950 using the ERA40 data set. In the upper troposphere (Figure 4e), the observed 200 hPa geopotential height anomalies are linked to the MSLP anomalies, with a general decrease over the North Pole surrounded by positive geopotential height in the mid latitudes and a strong dipole over the North Atlantic region. A general decrease in the observed geopotential height dominates at low latitudes, consistent with the generalized cooling tendency observed in the tropical troposphere.
 In observations the anomaly pattern in the troposphere is mirrored in the stratosphere by a cold and deep polar night vortex, as observed in the 50 hPa geopotential height anomalies (Figure 4g) showing a large statistically significant decrease in geopotential height over the pole of around 200 m. A weaker anomaly at 50 hPa is observed at low latitudes, with a geopotential height increase of about 25 m which has been attributed to the direct heating effect of the volcanic aerosol in the lower tropical stratosphere [Ramachandran et al., 2000; Stenchikov et al., 1998]. The observed low 50 hPa geopotential height at high latitude is associated with a colder polar lower stratosphere, which suggests a stronger and persistent polar vortex. Recent studies suggest that this might be a characteristic of the early stage of the post-volcanic winter season. For example,Graf et al. saw no clear weakening of the wave activity during post-volcanic winter in observations andMitchell et al.  show that the observed polar vortex in the upper stratosphere is weaker than normal from the end of January into February after the three major volcanic eruptions since 1960.
 As for the MSLP, the modeled geopotential height anomalies at 200 hPa are highly variable (Figure 8). Most models simulate a significant uniform decrease in the geopotential height roughly south of 30°N, as can be seen in the multimodel composite Figure 4f, stronger than in the observations. The strongest anomaly is observed for GFDL model. A significant uniform decrease over the Pole is observed only for MRI.
 A few of the models capture the anomalies observed in the stratosphere (see Figure 9) as in the reanalysis, though much weaker. HadGEM2, MPI, CNRM-CM5 and MRI simulate a decrease in the geopotential height of order of 25 m, although such a response is not a substantial change with regards to the background variability of the polar vortex.Thompson and Wallace  noted that over 1958–1997, as observed in ERA40, the leading EOF of 50 hPa wintertime geopotential height anomalies, which accounts for about 50% of the variance, is around −270 m. Other models show no significant anomaly at high latitudes. As observed from the multimodel mean Figure 4h, the most robust feature in the stratosphere is a statistical significant increase in the geopotential height at low latitude in agreement with the observations. This is weaker than in the ERA40 composite (see Figure 5d) and is likely due to the stronger cooling simulated in the tropics (Figure 4b) which tends to shrink the atmospheric column, as noted in S06.
 As with temperature and MSLP, the difference in the anomalies of 50 hPa and 200 hPa geopotential height between the multimodel mean and the observations, Figures 5c and 5d, is highly significant and confirms the difficulty of models to simulate the observed circulation changes in the stratosphere and upper-troposphere.
3.4. NAO Index
 As noted in section 3.2, the observed anomalies in the MSLP in the post-volcanic winters are not well reproduced by the CMIP5 models. The observed MSLP anomalies in the winters after the largest volcanic eruptions since 1880, project onto the leading variability mode of the NH circulation, especially the NAO index, with a significant prevalence of positive NAO in the first winter after the eruption [Fischer et al., 2007], both in terms of amplitude and number of positive events [Christiansen, 2008].
 In this section we test whether looking at the principal modes of variability can help to better isolate the dynamic response in the model simulations. As mentioned in section 2, we use the same time convention adopted by Fischer et al.  and S06to identify the 1st and 2nd winter after each eruption. The majority of the volcanoes erupted in the spring-early summer but two erupted in autumn, the minor eruption of Fuego in October 1974 and the large eruption of Santa María at the end of October 1902. It is likely that their full effect won't be present in the first winter immediately after the eruption and therefore the first winter should be considered to be a full year after the eruption time, as listed in ourTable 2. This differs from the time convention adopted by Christiansen  who considered the first winter immediately after the eruption for all the volcanoes, hence changing the years of winters considered for the two eruptions of Fuego and Santa María. In his paper he reported the robustness of his results when those two eruptions are excluded from the analysis. However, we show here that with the different dating convention the results are affected when these two eruptions are included.
 When all nine eruptions south of 40°N as listed in Table 2 are included, the 20CRv2 shows a clear prevalence of positive NAO index in the first year after the eruptions (Figure 10, 20CRv2, lag 0). The amplitude is significant roughly at the 4% level with seven volcanoes out of nine with positive NAO in the first winter and this occurrence is significant at the 9% level. No significant signals are observed for the second post-volcanic winter.
 Only two post-volcanic winters show a negative NAO, after the eruptions of Agung and Quizapu, which both erupted in the southern hemisphere. Agung's aerosol was mostly concentrated south of the Equator [Robock, 2000] and Quizapu has the weakest effect on the stratospheric optical depth and temperature between 30°S and 30°N among all the analyzed volcanoes (Figures 2 and 3). This could affect the dynamics associated with the forcing of the NAO circulation. Our results are unchanged if we exclude the Quizapu eruption from the volcanoes used in the composite. We also note that, although positive, the winter 1903–04 after the Santa María eruption has a NAO signal close to zero (0.03, also consistent in the DJFM composite with −0.04 as confirmed in Christiansen [2008, Figure 2]), which further reduces the number of occurrences of positive NAO events in the first winter after an eruption.
 Among the 13 models analyzed in this study, positive NAO signal at lag 0 is observed only for GISS-E2-R (at the 7% significance level) and CanESM2 at the 3% significance level. Only CNRM-CM5 shows a significant number of positive NAO events at lag 0 (52/90, p = 0.07) but the composite amplitude reaches only 11% of significance level. The analysis is confirmed by the MSLP gridded anomalies shown inFigure 7where CanESM2 also shows a weak NAO-like dipole when averaged across 2 winter seasons. The MRI-CGM3 is the only model that shows a significant number of positive NAO events in the second winter after the eruptions (p = 0.08) but the model appears to have a positive NAO at all lag times, so it is not clear whether this response is necessarily associated with the volcanic eruption.
 The other models show no significant positive anomaly at lag 0, but many spurious signals are detected at various lags for different models. CSIRO-Mk3.6 displays a negative NAO at lag 0, while other models (NorESM1-M and CCSM4) show negative NAO at lag-1. HadCM3 and CNRM-CM5 detect a positive NAO at lag -3: the signal could partially be due to the degenerate contribution of the Krakatau eruptions that happens 3 years before the 1886 eruption of Tarawera and shows a positive NAO in both of these models (not shown). Finally, strong signals are displayed by HadGEM2-ES at lag -1 and NorESM1-M at lag +1: such signals could both be unphysical and occur by chance or they could also depend on periodicity sampled in the epoch analysis at the same frequency of the volcanic signal. We have not analyzed in detail the origin of the spurious result of these two models.
 As mentioned above, when a different convention is used to identify the closest winters affected by the eruption of Santa María and Fuego, changes are observed in the superposed epoch analysis. Figure 11 examines the robustness of the analysis with respect to the choice of the winters after Santa María and Fuego, using the convention adopted in Christiansen . Since the reanalysis are based on a limited sample, they prove to be highly sensitive to changes in the epoch keydate. The signal at lag 0 becomes now highly significant (1% level) with an occurrence of 7 positive NAO out of 9 events (p = 0.09). Most of the change in the signal comes from the Santa María event, which shows a strong positive NAO in the winter 1902–1903, immediately after the eruption and positively contributes to enhance the epoch composite at lag 0.
 The largest effect of the change of the year of the first winter after the eruptions of Santa María and Fuego is observed for HadGEM2, which does not detect any significant signal at any lag. With 10 ensemble members, CNRM-CM5 is the only one that still detects a positive NAO at lag 0. The amplitude is small but slightly more significant than in the previous composite (it reaches now the 10% level of significance) and the number of events is significant (56/90, p = 0.02). Among the other models, only MRI-CGM3 detects a significant number of positive events at lag 0 (18/27, p = 0.08) but, as noted before, the models tends to show positive NAO almost at all lags. Although this model shows the strongest decrease of the geopotential height at high latitudes both at 50 hPa and 200 hPa, this seems not enough to reproduce a significant NAO signal or surface temperature anomaly.
 The main conclusions of this section are (1) the superposed epoch analysis of the 20CRv2 NAO index confirms previous findings of a positive NAO in the first winter following the major tropical eruptions in the 19th and 20th century, but the strength of the signal is sensitive to the choice of the key dates for each eruption, which points to the sparseness of observations hampering our understanding of processes; and (2) as observed in the previous sections, the models struggle to reproduce a detectable positive NAO signal in the first post-eruption winter. With 10 ensemble members, the CNRM-CM5 model results are the most robust to changes in the definition of the post-volcanic key dates. With less ensemble members, the other models show sensitivity to the definition of the key dates. We finally note that, since in this work we only analyzed the ensemble of CMIP5 historical runs, the bootstrap distribution might give a conservative estimate of the significance associated with the signal. Clearer signal detection could be achieved by drawing the random matrix from the CMIP5 control simulations, therefore relying only on natural variability not influenced by volcanoes or other forcings.
4. Discussion and Conclusions
 All available models submitted to the CMIP5 archive as of April 2012 that had a reasonably realistic representation of volcanic eruptions and number of samples have been analyzed for their ability to simulate post-volcanic radiative and dynamic responses. With substantially different dynamics between the models it was hoped to find at least one model simulation that was dynamically consistent with observations, showing improvement sinceS06. Disappointingly, we found that again, as with S06, despite relatively consistent post volcanic radiative changes, none of the models manage to simulate a sufficiently strong dynamical response. Although all the models reproduce reasonably well the increase in geopotential height in the lower stratosphere at low latitudes, none of the models simulate a sufficiently strong reduction in the geopotential height at high latitudes and correspondingly the MSLP pressure fields and temperature fields show major differences with respect to the observed anomalies. This is despite some models having 10 ensemble members, giving a potentially strong signal-to-noise ratio.
 It is unclear why models fails to simulate the dynamics following volcanic eruptions. The dynamical mechanism proposed by Stenchikov et al. [2002, Figure 13], involves lower stratosphere tropical heating caused by the presence of volcanic aerosols which gives rise to a stronger polar vortex due to the thermal wind relationship. A stronger vortex also could be due to a decrease in planetary wave forcing from the troposphere, although the evidence for this is unclear. The modeling results of Stenchikov et al.  showed a decreased EP flux into the stratosphere following the Pinatubo eruption but observations suggest an increase in the EP flux following the Agung, Fuego, El Chichón and Pinatubo eruptions [Graf et al., 2007]. S06 suggested that models might be biased toward an unrealistically strong polar vortex which results in a weak wave feedback between stratosphere and troposphere. From column three of Table 3 we observe a large variability among the 13 models in their climatological 50 hPa zonal wind at high latitude. Some models have stronger zonal winds compared to ERA40 but their response to volcanic forcing does not differ from what is observed for the models characterized by a lower climatological wind. Although this does not confirm the findings of S06, based on a limited number of models, we also notice that all models show considerably less variability in high-latitude stratospheric winds than observed, suggesting a stable polar vortex and more resistance to changes from external forcings, as found byS06.
 There are therefore still uncertainties in the dynamical mechanisms following volcanic eruptions particularly regarding the wave propagation through the polar stratosphere as seen in EP flux diagnostics [Graf et al., 2007].
 In addition, the degree of El Niño influence and interaction following volcanic eruptions is unknown. Based on the superposed epoch analysis of post-volcanic winters stratified according to the ENSO phase,Christiansen  concluded that the ENSO does not change the impact of volcanic eruptions on the Northern Hemisphere winter circulation, although the low number of cases imposes caveats on the conclusions. A recent work [Graf and Zanchettin, 2012] argues that ENSO has a different effect on the Northern Hemispheric winter circulation when the differences between Central-Pacific (CP) and East-Pacific (EP) El Niño events are taken into account. In particular, CP El Niño events appear to have a significant effect on winter NH circulation, with a tendency toward a negative NAO index. According to their definition, CP El Niño occurred in 1963–1964 and 1991–1992 but not in 1982–1983, which could explain the strong Eurasian warming signal observed after El Chichón, even though a strong El Niño event was taking place, and the relatively disturbed vortex in January 1992 [Graf et al., 2007]. Moreover, biases in model representations of ENSO variability [Guilyardi, 2006] could in the same way affect their response to volcanic forcing. The issue is also complicated by the intrinsic problems in defining the modes of ENSO variability [Takahashi et al., 2011]. In our analysis the large number of ensemble members should help to smooth out possible contaminations induced by the Pacific SST variability. Despite this, the models have a tendency to be in small negative ENSO phase, indicative of a weak La Niña phase. However, this should not lead to a weakening of the volcanic response in the models. While [Manzini et al., 2006] saw in model simulations that during the El Niño phase there was an increase in the vertical propagation of quasi-stationary planetary waves into the stratosphere from the troposphere, which caused a weaker, more disturbed vortex, during the La Niña phase they noticed no influence distinguishable from variability. Further studies using observations and model data have concluded similar results [Garcia-Herrera et al., 2006; Calvo et al., 2009]. Despite the model performance, the 20CRv2 reanalysis data set, which uses HadISST sea surface temperatures, yields an averaged ENSO 3.4 index of 0.07 during the volcanic eruptions analyzed here. It has also been suggested that large volcanic eruptions could actually trigger a positive phase of ENSO. Tung and Zhou  performed linear regressions on the HadISST and the Extended Reconstructed SST (ERSST) data set. While finding a weakly negative temperature volcanic response from linear regressions of the HadISST and the ERSST data sets using the ST data set as the volcanic signal, if the cold tongue index is assumed not independent of volcanoes in their linear regression, they find a large positive ENSO like pattern. Their findings, independent of the choice of volcano index, suggests a statistically significant El Niño response to a volcanic eruption in observations.
 While uncertainty still remains on the interactions between volcanoes and ENSO, the DJF warming signal can be seen independent of the choice of volcanoes, with the choice of the last four major eruptions, the last nine as used here, or longer term reconstructions of temperature from 1600 [Shindell et al., 2004] and the past half millennium [Fischer et al., 2007] which all reveal a statistically significant DJF warming following major volcanic eruptions, which, as noted by Marshall et al. , is extremely unlikely to be an artifact of internal variability. Despite this, we performed calculations of the DJF temperature anomaly for also the five biggest volcanoes (Krakatau, Santa María, Agung, El Chichón and Pinatubo) and also for the four best observed volcanoes that erupted in the satellite era (Agung, Fuego, El Chichón and Pinatubo) for all the models and the observations. Despite the observations showing, independent of these choices, a strong statistically significant warming, none of the models successfully simulate the observed response. GISS-E2-H shows a slightly increased DJF warming pattern, yet further investigation of MSLP anomalies reveal neither a large nor anywhere statistically significant positive NAO. bcc-csm1.1 also shows a small increase in surface temperature over the Eurasian region, yet the spatial response is not correct. Moreover, there is almost no statistical significance in the bcc-csm1.1 temperature fields over the Eurasian region and further investigation in this model reveals neither a positive or significant NAO signal.
 Finally, Stenchikov et al. found that including the Quasi-Biennial Oscillation (QBO) in the model made a substantial difference to the volcanic impact on the vortex. They found in observations following the Pinatubo eruption that the vortex was strengthened more in the second winter than the first, despite more aerosol being present in the stratosphere in the first winter. They proposed that this could be explained by the QBO being in the East phase in the first winter, which tends to weaken the vortex, and was in the West phase in the second winter, which tends to strengthen it. They concluded that a model with a QBO in the correct phase could better represent the dynamical simulation of the Pinatubo eruption. We note here that none of the models tested have a QBO in them, as can be observed fromTable 3 by the low standard deviation in the climatological winter 50 hPa zonal wind over the equator, which could affect the performance of the dynamical simulation.
 Another factor which could account for the poor simulation of the dynamical response following a volcanic eruption is related to how the aerosol is imposed in the model. We note that it is typical for a model to employ a very crude representation of aerosol in four latitude bands [Marshall et al., 2009], and the question of the suitability of this aerosol representation has been raised before [Otterå, 2008; Marshall et al., 2009]. Another reason for the “common failure” of models to simulate the dynamics following volcanic eruptions may be their representation of the AO. Otterå notes that it may be that models have a general basic inadequacy that does not allow a sufficiently strong AO response to large-scale forcing. Others have pointed to ozone as being an important factor [Stenchikov et al., 2002; Otterå, 2008], however, as noted by Marshall et al.  the response to the past major eruptions (before major ozone loss and larger amounts of ozone destroying chlorine in the atmosphere) is similar to that of El Chichón and Pinatubo combined, which suggests that inclusion of ozone chemistry is unlikely to be a major factor in the simulation of a volcanic eruption.
 The impact of volcanic eruptions on surface climate is the closest natural analogue to sulfate aerosol geoengineering, despite the differences in injection method and duration of the perturbation. Unlike sulfate aerosol geoengineering, the ability of models to accurately reproduce the response to volcanic eruptions can be tested against observations. Despite it being likely that a more uniform profile of aerosol in the stratosphere would occur from geoengineering than following volcanic eruptions, the results of GCM simulations of stratospheric geoengineering need to be considered in the light of their limitations when it comes to certain aspects of their responses to volcanic eruptions. This is of concern not only for the temperature response, but also for the precipitation response, as the dynamical effects following an eruption can often overwhelm the radiative response [Anchukaitis et al., 2010]. Accordingly, research into the climate response to volcanic eruptions and their simulations is an area of major importance, not only in its own right, but for stratospheric aerosol geoengineering.
 Simon Driscoll acknowledges financial support from the SPICE (Stratospheric Particle Injection for Climate Engineering) project, jointly funded by the UK EPSRC (Engineering and Physical Sciences Research Council) and NERC (Natural Environment Research Council). Lesley Gray is funded by the UK NERC National Centre for Atmospheric Research (NCAS) Climate Directorate. Alessio Bozzo has been jointly supported by NCAS and the NSF grant ATM-0296007 and acknowledges the support of the SAGES Centre for Earth System Dynamics at the University of Edinburgh. Alan Robock is supported by NSF grant ATM-0730452. We acknowledge the World Climate Research Programme's Working Group on Coupled Modeling, which is responsible for CMIP, and we thank the climate modeling groups (listed inTable 1 of this paper) for producing and making available their model output. For CMIP the U.S. Department of Energy's Program for Climate Model Diagnosis and Intercomparison provides coordinating support and led development of software infrastructure in partnership with the Global Organization for Earth System Science Portals.