This study analyzes whether the imprint of external forcings can be detected in the long-term evolution of the main atmospheric circulation patterns in climate simulations over the last millennium. The external forcing is not found to significantly add variability in any frequency band compared to control simulations where the external drivers are kept constant. Additionally, a method designed to detect a common signal in the time evolution of these circulation patterns among all simulations is proposed, and employed to demonstrate that the null hypothesis of an evolution dominated by internal variability cannot be rejected regardless of the time smoothing applied to the series. Given that the fingerprint of external forcings on atmospheric circulation has been successfully detected in simulations of the 20th century climate and in future climate change projections, we argue that either the effect of past natural forcing is too small, state-of-the-art climate models underestimate their climate sensitivity, or the anthropogenic forcing qualitatively differs from the natural forcing in its effect on main circulation patterns.
 The variability of the atmospheric circulation in December-to-February is dominated in each hemisphere by two annular modes: the Arctic Oscillation (AO) and the Antarctic Oscillation (AAO) [Thompson and Wallace, 2000]. Closely related with the former, the North Atlantic Oscillation (NAO) is another important mode of variability which largely influences the climate in the North Atlantic area in wintertime [Hurrell, 1995]. These annular modes modulate the intensity of westerlies and the midlatitude storm tracks, exerting a strong impact on regional climates at midlatitudes and hence on human activities. Thus, understanding their variability and its causes is important to estimate the range of possible fluctuations and evaluate their predictability under climate change scenarios.
 However, a key question is whether their long-term variations are to some extent driven by external forcings, or whether they are dominated by the internal variability of the climate system. Simulations with climate models show a response of the modes of atmospheric circulation to changes in the external forcing. Miller et al.  found that both the AO and AAO intensify in response to anthropogenic greenhouse gas forcing across almost all Coupled Model Intercomparison Project phase 3 (CMIP3) models. Similarly, Stenchikov et al.  analyzed the AO response of the CMIP3 models to volcanic eruptions in the 20th century and generally found an intensification, albeit weaker than that derived from observations. Swingedouw et al.  studied a long paleoclimate simulation and described how forcings can exert a fingerprint on the NAO evolution with a time lag longer than 40 years. More recently, Zanchettin et al.  identified a coherent decadal response of the NAO to strong volcanic eruptions in an ensemble of climate simulations of the last millennium and suggested a physical explanation. Similarly, Scaife et al.  used an ensemble of 20 ideal simulations with the HadGEM3 model to analyze the response of the NAO in the 5 years following an abrupt change in the solar irradiance, finding a delayed, physically plausible, response to these changes. Thus, a response of the atmospheric modes over the past millennium can be a priori expected at various timescales, although its detection outside the range of internal variations will depend on the magnitude of the forcing and on the realism of the response simulated by climate models. In general terms, the above question is related to the evaluation of the signal-to-noise ratio, which depends on the variable under consideration. For temperature, the fingerprint of the external forcings is very clear and can be detected even at regional scales, whereas in others, variables such as precipitation the internal variability becomes dominant [Gomez-Navarro et al., 2012]. However, the origin of the low-frequency (decadal to centennial) variability of NAO, AO, and AAO is still debated and can only be addressed through the use of large ensembles of climate simulations forced by realistic reconstructions of the external factors.
 Additionally, there exist a number of climate reconstructions for these variability modes. Notably, the NAO index has been reconstructed by a number of researchers based on different proxy indicators over the last centuries (see Pinto and Raible  for a recent review). However, although the reconstructed indices compare relatively well in the period used as calibration, disagreements become apparent in the preindustrial period. In any case, the analysis of the evolution of the annular modes and of the NAO in paleoreconstructions is only useful to constrain their possible response to anthropogenic greenhouse gas forcing in as much its past evolution is also driven by the external forcings. Trouet et al.  found that the reconstructed NAO index over the past millennium follows the evolution of the Northern Hemisphere temperature at centennial timescales, with stronger NAO during the Medieval Climate Anomaly (MCA) and in the recent warm period and a weaker NAO during the Little Ice Age (LIA). However, whether this is caused by a response of the NAO to external forcing is not clear, since the ultimate character and extent of the MCA as a manifestation of internal variability is still debated [Goosse et al., 2012].
 Hence, in the present study, we analyze, for the first time, a large ensemble of state-of-the-art climate paleosimulations over the last millennium to address the question of whether there is a long-term coherent response of the atmospheric annular modes of variability to the external forcings that significantly differs from the background of internal variability.
2 Data and Methodology
2.1 Climate Simulations
 This study employs state-of-the-art paleoclimate simulations belonging to two main ensembles with their respective control runs. On the one hand, a total of 10 simulations (denoted here as MILL-STRONG, MILL-WEAK, and MILL-CONTROL) with the same model setup has been performed under the umbrella of the MILLENNIUM project. On the other hand, the available simulations for the last millennium within the CMIP5 initiative have also been considered (denoted as Max-Planck-Institut (MPI), National Center for Atmospheric Research (NCAR), Institut Pierre-Simon Laplace (IPSL), Beijing Climate Center (BCC), NASA, and their respective control runs). Although the size of this multimodel ensemble is smaller (five members so far), it considers different model setups and tend to follow the same protocol [Taylor et al., 2012]. For further details on the meaning of these alias, the resolution, model setup, and forcings employed for these simulations, the reader is referred to the supporting information.
 We analyze the evolution of the indices of the December to February NAO, AO, and AAO within different climate simulations, defined by a Principal Component Analysis (see supporting information for technical details). First, the spectra of the series in the forced and unforced simulations are analyzed and compared to assess whether the forcing introduces additional variability in any frequency band. The spectra of the series have been obtained with the Barlett estimator using a cutoff window of 50 years [von Storch and Zwiers, 2003].
 Additionally, the common signal in the ensemble, and its statistical significance, has been identified by Principal Component Analysis. The method consists of pooling all the indices representing each variability mode derived from all simulations, and calculating the Empirical Orthogonal Functions of the resulting sets of indices. Given N different simulations, each one generating an index (NAO, AO, or AAO) for Ttime steps, the pseudo field
is constructed and its first Empirical Orthogonal Function (EOF) computed. If the variability in the series is completely due to internal dynamics, the series should be uncorrelated, and thus, the total variance is equally spread over all the N EOFs, with an expected value for the percentage of variance explained by each EOF equal to 100/N. On the other hand, large departures from this expectation value will indicate that the series present a significant amount of redundancy, pointing to a cross correlation of these indices across the simulations, caused by the response of the climate system to the external forcings.
 The probability distribution of the variance explained by the leading EOF is unknown, and it has been estimated numerically in this study through a combination of bootstrap and Monte Carlo methods. We use the original simulated series as seeds for a bootstrap method to generate surrogate series that are uncorrelated but that retain the same serial correlation structure of the original series [Ebisuzaki, 1997]. The surrogate series are then pooled, and the percentage of variance explained by the first EOF is obtained for these series. We repeat this Monte Carlo experiment 2000 times, which allows us to numerically obtain the distribution of this statistic under the null hypothesis of mutually uncorrelated series.
 The leading EOFs derived from the seasonal SLP field, defining the NAO, AO, and AAO indices, are shown for one of the MILL-WEAK simulations in the first row of Figure 1. The patterns are very similar across the simulations, with spatial cross correlations above 0.9 in all cases. The associated indices for all simulations, resulting from the projection of the SLP anomalies onto these patterns, are shown in the lower panels in the same figure, together with the reconstructions of the external forcings. Larger differences appear in the temporal component, which is the one analyzed hereafter.
3.1 Spectra of the Simulations
 The spectra of the NAO, AO, and AAO series for the CMIP5-forced (control) simulations are depicted in solid (dashed) lines in Figure 2, together with the spectrum of the total external forcings (black thin line, in different vertical scale) for comparison purposes only. The shaded area represents the range of spectra within the MILLENIUM ensemble and illustrates the spread in the calculation of the spectra within the same model. It is roughly of the same size as the error in the estimation of the spectra, illustrated by the vertical segment.
 While the forcing spectrum is slanted toward low frequencies, simulations exhibit a rather flat spectra which distributes homogeneously the variance over all frequency bands. The spread among different model runs is in general large enough to encompass the spread in the control runs, and it is similar to the error bars and to the spread of the MILLENIUM ensemble. The only exception seems to be the variability of the AO in the BCC model at periods of around 25 years. However, in this case, the signal is inconsistent, showing opposite behavior in the AO and AAO indices, and it is not shared with the rest of the models in the ensemble. In general terms, this figure strongly suggests that the external forcing does not consistently add variability to the background variability of internal processes in any frequency band. Intermodel and intramodel spectra are similar and do not differ significantly between forced and control runs.
3.2 Temporal Coherence Across Simulations
 Figure 1 shows the temporal evolution of the three circulation indices in the forced runs. The series appear incoherent, and in particular, it is hardly possible to identify the fingerprint of the external forcings in periods when they strongly change, such as the Late Maunder Minimum or the industrial period. The temporal coherence of these indices among the simulations has been analyzed by EOF analysis, following the methodology described above, and employing several time filtering (window size of 1, 51, and 101). The results are summarized in Table 1. When the series of the 13 model runs are pooled and the EOF analysis performed, in no case (see the columns labeled 1000-2000) is the amount of variance explained by the first EOF significantly different from what could be expected by chance (the threshold to reject the null hypothesis at the 95% confidence level is shown in parenthesis). Note, however, that the level of smoothing modifies the serial correlation of the series, which in turn sets the threshold of significance of the amount of variance explained by the first EOF of the pooled series well above the theoretical level for p=0.01 expected if the series were white noise. This can be clearly appreciated through the Monte Carlo simulations.
Table 1. Percentage of Variance Explained by First EOF of the Pooled Circulation Indicesa
aFor each index, the left column shows the results within the paleosimulations, whereas the right column depicts similar results but obtained in an ensemble of climate change projections for comparison (see main text for details). The calculations have been applied to the yearly series as well as to the series after being time-filtered by 51 and 101 year Hamming windows (31 and 51 years, respectively, in the climate change projections ensemble). The 95% confidence level threshold calculated through 2000 Monte Carlo simulations is shown in parenthesis. Significant results are emphasized by bold characters.
 Although we have not been able to reject the null hypothesis in any of the cases, this could be due to a low power of this test, i.e., the capability of the test to reject the null hypothesis when it is indeed false. To demonstrate that this is not the case, we have tested whether the EOF technique is able to reject the null hypothesis when it is indeed false. To do so, we have applied this test to a set of five climate change projections which are the continuation of the three MILL-STRONG simulations under a climate change scenario, together with two projections performed with the model ECHO-G under the SRES A2 and B2 [Zorita et al., 2005]. Both models have similar resolution (T30 and T31) and contain follow-up versions of the same atmospheric model (the European Centre/Hamburg family developed at the MPI-M). Under the strong forcings considered for future projections, both model versions tend to simulate a coherent response of the AO response [Miller et al., 2006]. Thus, the detection method based on EOF analysis should be able to detect this forcing signal in this five-member ensemble. The results of this test are shown in columns labeled 2006-2100 in Table 1 (note that in this case, shorter windows have been considered in the smoothing given the short period simulated).The NAO and AO indices, in this case, contain an important level of coherency in response to the common external forcing, which inflates the amount of variance explained by the first EOF beyond what could be expected by chance. The only case where the simulations seem to be dominated by internal variability is for the AAO index, but only when a 31 year window is used to smooth the series.
 Complementary to former studies, focused on the short-term response of the atmospheric circulation during the years following abrupt changes in the external forcings [Zanchettin et al., 2012, 2013; Scaife et al., 2013], this study employs an ensemble of recently developed climate simulations over the past millennium to focus on their long-term temporal behavior. On the one hand, it has been shown that the spectra of these indices distribute the variance homogeneously over all frequency bands and that there are not significant differences between the forced and control simulations. Thus, if external forcings drive the long-term evolution of these indices, its fingerprint is blurred by the noise of internal variability of the ensemble. On the other hand, a test based on Principal Component Analysis has been designed to establish whether the null hypothesis of long-term variability dominated by internal processes can be rejected. Several degrees of smoothing have been applied to the indices, but in no case, we could reject the hypothesis of uncorrelated evolution across the ensemble of simulations.
 There are several possible interpretations for these results. One is that the variability of the forcing during the past millennium is indeed not large enough to exert a detectable influence on the long-term evolution of these large-scale circulation patterns. Another possibility is that the amplitude of the reconstructed forcing employed in the simulations or its implementations in the models are not realistic enough. This question is still debated, but since some of the simulations included within the ensemble already implement large-variance reconstructions, the amplitude of the forcing required to detect a signal would be beyond what is currently considered realistic. Additionally, Shindell et al.  suggested that a fine stratosphere resolution and its dynamical coupling to photochemistry may be required to simulate a realistic response of atmospheric modes to solar forcing. This is partly in contradiction to Miller et al. , since the meridional temperature gradient seems to be capable of producing a strong response to greenhouse gas forcing of atmospheric modes in some, although not all, conventional models. A third possibility is that the sensitivity of atmospheric circulation modes of current climate models is unrealistically low. Previous studies, Miller et al.  and Stenchikov et al.  found evidence that although Fourth Assessment Report models do display a response to future climate forcings, their sensitivity to volcanic eruptions in the 20th century is underestimated compared with observations. A fourth possibility is that the signature of the external forcing becomes recognizable only after a stronger time filtering that has been considered in this study. The methodology employed here is designed to identify the long-term response. Shorter-term response in the NAO variability at decadal scales has been indeed identified by means of superposed epoch analysis [Zanchettin et al., 2013]. Finally, the response of the tropospheric circulation might not be describable by changes in these winter circulation modes.
 The implications of these results are multifaceted. Since the circulation modes have a strong influence on seasonal precipitation or temperature in some regions, e.g., the NAO on winter Mediterranean precipitation or Scandinavian winter temperature, a disagreement between seasonal regional reconstructions and model simulations is not indicative of a deficient reconstructed or modeled climate, since both would be strongly influenced by internal dynamics rather than by the external forcings. A prominent role of the atmospheric modes to explain the transitions from the MCA to the LIA [Trouet et al., 2009] does not seem to be substantiated by these model ensembles. Additionally, if the ratio between internally generated variability and externally forced of the circulation modes is in reality higher than simulated in climate change simulations, the spread of regional climate projections for those regions will also be underestimated.
 This work was supported by the PRIME2 project (priority program INTERDYNAMIK, German Research Foundation). We acknowledge the World Climate Research Programme's Working Group on Coupled Modelling, which is responsible for CMIP, and we thank all the climate modeling groups for producing and making available their model output. For CMIP, the U.S. Department of Energy's Program for Climate Model Diagnosis and Intercomparison provides coordinating support and led development of software infrastructure in partnership with the Global Organization for Earth System Science Portals. The authors also thank Johann Jungclaus for his instructive comments on the first versions of this article, as well as the MPI-M Integrated Project Millennium for making available some of the simulations used in this analysis. Finally, the authors thank the editor and the two referees for their helpful comments on this manuscript.
 The Editor thanks two anonymous reviewers for their assistance in evaluating this paper.