Climate simulations suggest that strong tropical volcanic eruptions (SVEs) induce decadal dynamical responses in the coupled ocean-atmosphere system, which protract the climate recovery beyond the short-lived radiative forcing. Here, for the first time, we diagnose the signature of such responses in European seasonal climate reconstructions over the past 500 years. The signature consists of a decadal-scale positive phase of the winter North Atlantic Oscillation accompanied by winter warming over Europe peaking approximately one decade after a major eruption. The reconstructed delayed winter warming is compatible with formerly suggested mechanisms behind simulated SVE-driven climate responses, thus corroborating the existence of SVE-driven decadal climate variability. Historical climate-state uncertainty may, however, hamper unambiguous statistical and dynamical assessments both for multiple and for individual SVEs.
 Strong tropical volcanic eruptions (SVEs) impose natural, short-term (1–2 years) energy imbalances on the climate system resulting in temporary, strong near-surface global cooling [Robock, 2000]. Robust short-term radiative and dynamical responses to SVEs have been detected in both climate reconstructions [Crowley, 2000; Fischer et al., 2007; Hegerl et al., 2011] and simulations [Jungclaus et al., 2010; Otterå et al., 2010; Hegerl et al., 2011; Zanchettin et al., 2012a] of the last millennium. SVEs are thus a major natural factor determining interannual climate variability. Modifications of the oceanic thermohaline circulation initiated by the SVE-driven radiative cooling and sustained by ocean-atmosphere feedbacks [Otterå et al., 2010; Mignot et al., 2011; Miller et al., 2012; Wang et al., 2012; Zanchettin et al., 2012a] can considerably protract the recovery of the climate system from volcanic perturbations [e.g., Church et al., 2005; Gleckler et al., 2006].
 Both short-term [Hegerl et al., 2011] and decadal-to-multidecadal [Mignot et al., 2011; Miller et al., 2012; Zanchettin et al., 2012a,2012b] dynamical responses to SVEs are, however, only partially constrained by the imposed forcing. Observed or simulated forced responses to a small sample of eruptions can therefore be in principle elusive, particularly in the noisy atmospheric and near-surface components. Nonetheless, typical post-eruption decadal features can be identified in climate model simulations when a large number of SVEs are investigated in ensemble analysis. These include positive anomalies of the winter North Atlantic Oscillation (NAO) index [Wang et al., 2012; Zanchettin et al., 2012a] and warm winter surface (2 m) air temperature (SAT) anomalies over the continental Northern Hemisphere peaking approximately one decade after the eruption [Zanchettin et al., 2012a].
Zanchettin et al. [2012a] provided a dynamical framework to interpret this delayed winter warming (DWW) based on ensemble Earth-system-model (ESM) simulations of the last millennium [Jungclaus et al., 2010]. Their proposed compatible processes include anomalously strong ocean heat release over the Arctic Ocean related to decadal modifications in the North Atlantic oceanic circulation [see also Stenchikov et al., 2009; Otterå et al., 2010] and strong signal amplification in the Arctic [see also Miller et al., 2012] influencing the westerly atmospheric circulation across the North Atlantic/European sector. In the following, we use European climate reconstructions [Luterbacher et al., 2002, 2004] to substantiate the so far model-based hypothesis of decadal volcanically forced climate variability devoting particular attention to the statistical significance of the reconstructed DWW signals. We thus demonstrate that the DWW provides a reliable perspective for interpreting the decadal evolution of past European climate. We further discuss how the DWW can represent a benchmark for testing the ways the consistency between reconstructed and simulated climate variability is assessed.
2 Data and Methods
 European seasonal SAT reconstructions are from Luterbacher et al. . Seasonal sea level pressure (SLP) and 500 hPa geopotential height (Z500) reconstructions over the Eastern North Atlantic and Europe and the reconstructed winter NAO are from Luterbacher et al. . These statistical reconstructions, particularly after the late 17th century, are mainly based on a combination of documentary evidence and long station data.
 We use the COSMOS-Mill simulations covering the period A.D. 800–2005 [Jungclaus et al., 2010] performed with the ECHAM5-MPIOM ESM developed at the Max Planck Institute for Meteorology. The ensemble consists of eight simulations: five full-forcing (natural and anthropogenic) simulations with solar forcing based on an estimate with relatively low long-term variability and three full-forcing simulations with estimates of more strongly varying low-frequent solar forcing. Additionally, the corresponding 3100-year control run is used to evaluate statistical significance. Details of the simulations, including the implementation of volcanic forcing, are available from Jungclaus et al. . The ensemble complies with the “Paleoclimate Modelling Intercomparison Project Phase III” requirements, it compares well with reconstructions of multidecadal North Atlantic sea-surface-temperature variability during periods dominated by external forcings [Zanchettin et al., 2012b], and it is climatologically and probabilistically consistent with reconstructed annual central European mean temperatures for the last ~500 years [Bothe et al., 2012]. DWW features and their dynamical interpretation were first proposed based on the weakly varying solar forcing ensemble used here [Zanchettin et al., 2012a].
 A superposed epoch analysis [e.g., Fischer et al., 2007] is performed for the nine SVEs listed in Table 1. The selected SVEs occurred during the time period covered by the reconstructions, therefore differing from those in Zanchettin et al. [2012a] spanning the period A.D. 800–1900. Eruptions during the most recent decades are excluded to avoid inclusion of spurious signals due to background warming conditions. We study 5-year delayed post-eruption anomalies evaluated with respect to the pre-eruption climatology, defined as the average state over the decade preceding the eruption.
Table 1. Characteristics of the Investigated SVEsa
Date of Eruption
Annual Top-of-Atmosphere Forcing (Wm−2)
First Post-Eruption Winter (ref. Jan)
First Post-Eruption Summer
aTop-of-atmosphere forcing estimates are based on the COSMOS-Mill volcanic forcing-only simulation [Jungclaus et al., 2010]. The last two columns indicate respectively our definition of the first post-eruption winter (December-January-February) and of the first post-eruption summer (June-July-August).
Serua/Banda Api (Indonesia)
Santa Maria (Guatemala)
 Proper assessment of statistical significance of both reconstructed and simulated anomalies is applied for our inferences. Significance is estimated based on the likelihood of a random occurrence of the signals [see, e.g., Hegerl et al., 2011; Graf and Zanchettin, 2012]. Specifically, the signal obtained for the selected SVEs is compared to that obtained by randomly sampling n years from the full period, including those of the selected SVEs, with n being n = 9 different individual eruptions for the reconstructions, and n = 9 * 8 simulated events for the simulation ensemble. Five hundred sequences of these random events are evaluated over the available temporal domain and otherwise treated in exactly the same manner as real volcanic events. For reconstructions, random sequences are sampled from the whole reconstructed time. For simulations, they are sampled from the whole length of the control run. Autocorrelation is therefore preserved in the estimation of significance. Percentile intervals of the anomaly distribution obtained from the randomization are used to evaluate the confidence levels associated to a chance occurrence of the signal.
 Uncertainties on DWW signals concern the dating of the eruptions and the lag of the peak DWW, the latter accounting for variability in the duration of the post-eruption fluctuation. Timing uncertainties are assessed by mapping the average response yielded by 100 sets of randomly adjusted eruption dates and lag, where for each individual eruption an integer i drawn from a random uniform distribution encompassing the range [−1:1] (for eruption dates) and [−2:2] (for lags) was added to the original value. Robustness of the ensemble composite statistics is assessed through a leave-one-out full-analysis, i.e., by performing a set of composite analyses where each listed eruption is iteratively excluded from the calculation in order to ensure that the result is not due to a large event following a single eruption. The weakest diagnosed local anomalies represent the lower-boundary confidence on the signal. The significance test for the pattern including timing uncertainty and the pattern assessing ensemble robustness follows the randomization approach described above (n = 8 for the leave-one-out analysis).
 In comparing (ensemble-)simulated and reconstructed DWW anomaly patterns, we propose a simple assessment of probabilistic consistency additionally to the common approach using the ensemble average. A simple rank-based score allows assessing whether the ensemble distribution of simulated DWW signals provides a reliable estimate of the likelihood of occurrence of a reconstructed event. Hamil  provides a general overview on rank histograms and on their interpretation; Bothe et al.  discuss their application on the COSMOS-Mill simulations. The score is evaluated for each grid point as follows: ranks for the reconstructed anomaly are computed for each individual SVE separately, i.e., nine ranks are calculated. A chi-square goodness-of-fit test is then performed to assess whether the obtained ranks were likely drawn from a uniform distribution, i.e., whether there is equal probability that the ensemble underestimates, overestimates, or accurately represents a reconstructed event. The average of the nine ranks is used for the plotting.
 Figure 1 shows the reconstructed post-eruption evolution of the NAO and of the field-average European SAT for winter and summer seasons. In winter (Figure 1a), a significant positive NAO anomaly develops after the eruption and persists throughout the first post-eruption decade, with highly significant values about one decade after the eruption. Consistently, significant warm winter SAT anomalies are found about one decade after the eruption: the DWW. In summer (Figure 1b), only weak changes are diagnosed in the dominant circulation on decadal timescales. Summer SAT anomalies depict a weak decadal fluctuation, with nonsignificant peak anomalies.
 Figure 2a illustrates the DWW pattern at its strongest manifestation at lag 8, i.e., including the 9th–13th post-eruption winter anomalies. Significant warm wSAT anomalies spread over northern Europe, and they are strongest in the western Baltic Sea coastal region (about +1.2 K). The pattern is robust against the SVE selection (see section 'Data and Methods') and against timing uncertainties (Figures 2b and 2c). The detection of DWW is also robust against using different numbers of volcanic events, such as three different thresholds of eruption size and type used in Hegerl et al.  (not shown). Defining the pre-eruption climatology as the average state over the last five instead of the last 10 pre-eruption years leads to an even stronger DWW (not shown). The delayed summer SAT anomaly pattern (Figure 2d) entails peak warm anomalies of about +0.3 K that are only locally significant and fail the robustness tests.
 The reconstructed anomaly pattern of winter atmospheric circulation associated to the DWW entails a substantial bipolar anomaly over the eastern North Atlantic corresponding to a positive NAO phase, which is diagnosed in SLP and Z500 data, and a significant positive anomaly over central-eastern Europe, which is prevalent only in Z500 data (Figure 2e).
 Figure 3, left panels, illustrates the ensemble-average simulated post-eruption winter anomalies of Z500 and SAT corresponding to the reconstructed peak DWW. The Z500 pattern corresponds to the large-scale traits of a positive NAO phase (Figure 3a), although the positive and negative centers are slightly displaced with respect to the climatological positions of the NAO's centers over the Azores and the Labrador Sea (line contour pattern). Consequently, a significant positive Z500 anomaly spreads over Europe, which is consistent with the reconstructions (Figure 2e). The SAT pattern entails significant warming over Scandinavia (Figure 3c), but the maximum amplitude of the anomaly is only about one third of the peak reconstructed anomaly (Figure 2a). According to the rank counts, the ensemble probabilistically underestimates the strength of the NAO-like anomaly over the North Atlantic (Figure 3b) and also the reconstructed SAT anomaly (Figure 3d). However, the rank counts suggest that both the reconstructed SAT and Z500 anomalies lie well within the range of simulated responses. The general good impression about probabilistic consistency is reinforced by the goodness-of-fit tests only sporadically rejecting the hypothesis of a uniform distribution of the rank counts.
 The DWW described by our results is the first signature of volcanically forced decadal-scale near-surface regional variability that has been consistently diagnosed in climate simulations and reconstructions. Reconstructions and simulations agree that DWW events (1) are generally confined to the winter season and are strongest over northern Europe, (2) are generally associated to a prolonged post-eruption positive NAO phase, and (3) occur about one decade after a major tropical eruption. This corroborates the typical physical mechanism for the DWW described by Zanchettin et al. [2012a].
 The amplitude of ensemble-simulated average DWW signals is weaker than the corresponding reconstructed signals. Zanchettin et al. [2012a] used a different set of generally stronger SVEs in the weak-solar-forcing COSMOS-Mill ensemble yielding a stronger DWW. This reflects the ample range of possible responses produced by individual events and simulations, meaning that the DWW is not always a dominant component of simulated climate variability. Similar to the classical post-eruption winter warming [e.g., Hegerl et al., 2011], the DWW can only be diagnosed robustly if considering multiple events. According to our simple assessment of probabilistic consistency, the ensemble simulations may also be slightly negatively biased with respect to the reconstructed anomaly. This leaves space for different interpretations. One concerns the quality of simulated physical processes relevant for the dynamics behind the DWW. The simulated northward-displaced (compared to reconstructions) post-eruption atmospheric anomaly over the North Atlantic may imply a less effective eastward advection of warmer air of Atlantic origin over midlatitude Europe. It reflects structural characteristics of the simulated NAO, which captures the gross features of the observed pattern but slightly displaces the centers of action [Zanchettin et al., 2012a, figure 2a]. In addition, the Atlantic meridional overturning circulation (AMOC) regulating the northward ocean heat transport in the Atlantic thereby affecting European regional climate may respond relatively weak to volcanic forcing in the COSMOS-Mill ensemble (compare, e.g., Stenchikov et al. , Otterå et al. , and Zanchettin et al. [2012a]).
 We further note that in our selection of SVEs several events occur at about 20-year intervals, a value close to the typical length of simulated post-eruption AMOC fluctuations [Zanchettin et al., 2012a]. This interdecadal memory implies that our set of SVEs may entail non fully independent (simulated) events. Consequently, uncertainty in the initial conditions may affect the general DWW properties for ensemble SVEs sampled within individual simulations. This interpretation of uncertainty challenges the way (regional to continental) climate responses to volcanic forcing are assessed in long transient simulations, thereby complicating the attribution of past decadal climate variability as well as the potential predictability of decadal responses to individual short-term forcing events such as SVEs.
 Reconstructions and climate simulations for the last five centuries support the hypothesis that strong tropical volcanic eruptions were typically followed, after approximately one decade, by a succession of anomalously warm winters over Europe (delayed winter warming). Cross-validation of ensemble-simulated and reconstructed representations of the delayed winter warming highlights that while delayed winter warming occurs on average, individual realizations can vary substantially due to internal variability affecting intrasesasonal to interannual climate dynamics and to different climate conditions at the time of the eruption. Hence, a probabilistic approach is needed for ensemble interpretation.
 The authors thank two anonymous reviewers, whose suggestions helped to improve the presentation of this study. We also thank Jochem Marotzke, Daniela Matei, and Max Popp for useful comments on earlier versions of the manuscript. This work was carried out as part of the MPI-M integrated projects “Millennium” and “Super Volcano,” and was supported by the Federal Ministry for Education and Research in Germany (BMBF) through the research program “MiKlip” (FKZ:01LP1158A (D.Z.):/01LP1130A(C.T.)). O.B. was supported through the Cluster of Excellence “CliSAP,” University of Hamburg, funded through the German Science Foundation (DFG). J.L. acknowledges support from the DFG Projects PRIME 2 (PRecipitation In past Millennia in Europe- extension back to Roman time) within the Priority Program ‘INTERDYNAMIK’. J.L. also acknowledges support from the DFG Project “Historical Climatology of the Middle East based on Arabic sources back to AD 800” and from the EU/FP7 project ACQWA (NO212250). G.H. received support from NERC (NE/G019819/1) and NCAS.