Initialized and uninitialized decadal retrospective forecasts (re-forecasts) are used to assess the key regions providing multi-year prediction skill of the Atlantic multi-decadal sea surface temperature variability (AMV) and to address the relative roles of the initial conditions and external forcing on this skill. The results show that there is a decay in the AMV skill with forecast time, which is likely to be driven by skill degradation in predicting the AMV subpolar branch due to the lack of skill in predicting the subtropical branch. An important role of the varying radiative forcing in the AMV-related prediction skill is found over the Labrador and Irminger deep convection regions. Initialized predictions show the largest impact on the improvement in the AMV-related skill over the area where the Atlantic subpolar gyre operates. Initialization appears also to correct an unrealistic anticorrelation between the AMV phase and the Gulf Stream found in the uninitialized re-forecasts.
If you can't find a tool you're looking for, please click the link at the top of the page to "Go to old article view". Alternatively, view our Knowledge Base articles for additional help. Your feedback is important to us, so please let us know if you have comments or ideas for improvement.
 Decadal to multi-decadal climate variability has been identified in the Atlantic Ocean, particularly in the North Atlantic basin. In this region, low-frequency sea surface temperature (SST) variability is referred to as Atlantic multi-decadal variability (AMV) or Atlantic multi-decadal oscillation (AMO) [e.g.,Smith et al., 2012]. The large-scale AMV SST pattern is thought to be related to the Atlantic meridional overturning circulation (AMOC) multi-decadal variations, or the Atlantic portion of the thermohaline circulation [e.g.,Knight et al., 2005; Dijkstra et al., 2006; Zhang, 2008]. Potential predictability for the North Atlantic SST has been found on decadal time-scales [e.g.,Pohlmann et al., 2004]. Recent studies using initialized decadal hindcasts have illustrated encouraging skill in forecasting the AMV a few years ahead [Pohlmann et al., 2009; Doblas-Reyes et al., 2011; van Oldenborgh et al., 2012; García-Serrano and Doblas-Reyes, 2012]. However, the roles of internal natural variability and external forcing in the AMV forecast skill have not been clearly identified so far. There is, therefore, scope for a focused investigation to better understand the AMV prediction skill.
 The AMV affects climate and weather conditions around the North Atlantic and in remote regions, particularly the surface temperature and precipitation in North America and Europe, hurricane activity in the tropical Atlantic, precipitation in South America and West Africa, and climate variability in the tropical Pacific (for reviews, see Murphy et al.  and Smith et al. ). Thus, predicting the AMV and understanding its predictability sources is of great relevance for both the climate forecasting community and socioeconomic applications. Near-term re-forecasts have helped to obtain improvements in some aspects of interannual-to-decadal variability prediction skill, with particular emphasis on the North Atlantic basin [Keenlyside et al., 2008; H. Pohlmann et al., Predictability of the mid-latitude Atlantic meridional overturning circulation in a multi-model system, submitted toClimate Dynamics, 2012]. The relative contributions to the AMV skill from initial and boundary (i.e. varying radiative forcing) conditions are, however, still unclear. Previous results suggest that AMOC-related internal variability drives subpolar SSTs, while subtropical SSTs are largely controlled by external forcings [Ottera et al., 2010; van Oldenborgh et al., 2012]. This manuscript uses the best available set of decadal re-forecasts to investigate these two facets of the AMV predictive skill. The objective of this research is twofold: first, to assess multi-year prediction skill of the AMV SST index by comparing a dynamical multi-forecast system with simple empirical (statistical) predictions; second, to shed some light on the processes contributing to the identified skill in the North Atlantic region.
2. DePreSys Decadal Re-forecasts and Methodology
 This study uses the decadal re-forecasts of the perturbed-parameter ensemble predictions produced with the Met Office Decadal Climate Prediction System (DePreSys) [Smith et al., 2007, 2010] as part of the EU project ENSEMBLES [Doblas-Reyes et al., 2010]. Ten-year long re-forecasts were started on the first of November once every year over the period 1960–2005 using a nine-member ensemble of HadCM3 model variants. The set of initialized decadal re-forecasts used in this study was run explicitly prescribing the contemporaneous state of the climate system at the start date; these will be referred, simply, to as DePreSys. In order to assess the impact of the initialization an additional set of uninitialized re-forecasts (referred to as NoAssim) with the same nine model versions was run. What is unique in these two sets and in the ENSEMBLES multi-model, in comparison with the Coupled Model Intercomparison Project phase 5 (CMIP5) protocol (K. E. Taylor et al., A summary of the CMIP5 experimental design, 2011,http://www-pcmdi.llnl.gov/), is that they use information available in a real-time forecast context. For that reason we refer to these decadal integrations as re-forecasts, not hindcasts. The ENSEMBLES decadal re-forecasts do not include observed time-dependent variations in solar activity or volcanic aerosols, but they take into account changes in external forcing such as greenhouse gases, sulphate aerosols, and anthropogenic emissions. Further description of the experimental design is provided in the auxiliary material.
 The study focuses on time-averaged predictions based on annual means, ranging from January to December. These annual values have only nine years in the forecast time because all the decadal predictions start on November 1st and end on October 31st of the last year. The DePreSys/NoAssim model climate is estimated here by averaging raw forecasts along the actual time (the start date dimension) according to availability of observations. All re-forecast anomaly time-series have been computed, for each model version separately, by removing a model climate estimate at each forecast time. A four-year forecast average is performed afterwards upon these bias-corrected anomaly time-series. Based on the World Climate Research Programme (WCRP) recommendations, the observational climatology has been estimated as the average over the period for which both observations and re-forecasts for a specific forecast time are available, instead of using a longer, long-term mean [García-Serrano and Doblas-Reyes, 2012]. The observational anomalies are also used after performing a four-year average. To ensure having the same number of verification years at each forecast time, the verification period employed corresponds to 1966–2009. Following the four-year average approach this implies that the common verification period spans from 1966/69 to 2006/09, with a total of 41 values per forecast time. Due to the strong autocorrelation of the AMV index, the 41 observational data points represent slightly above 10 effective degrees of freedom [von Storch and Zwiers, 2001], which for a statistically significant value at the 5% confidence level corresponds to a correlation coefficient of around 0.54 (one-tailedt-test, as only positive correlations indicate skill).
 The AMV index adopted in this study was defined in Trenberth and Shea : SST anomalies averaged over EQ-60°N / 280°E-360°E minus global SST anomalies averaged over 60°S-60°N. Later in the manuscript the area defining the AMV index is subdivided into the northern and southern parts. The former extends from 30°N to 60°N, involving subpolar latitudes, and will be referred to as AMV-SP; the latter covers the EQ-30°N latitudinal range representing subtropical latitudes, will be referred to as AMV-ST. These areas define two different SST indices after subtracting the global SST average, as in the AMV definition.
 The DePreSys/NoAssim decadal predictions are compared against four empirical prediction methods. Three of these methods are based on 1-year, 5-years and 10-years persistence; whereas the forth empirical method is based on a simple lagged regression model. Details of these statistical methods are given in the auxiliary material. All empirical predictions are produced and verified for the common 1966/69 to 2006/09 period. The NOAA extended reconstructed SST v3b (ERSS) [Smith et al., 2008]) has been used as the reference dataset for verification and to train the empirical models.
3. Skill of the AMV
 To evaluate the annual AMV forecast quality and the contribution of the initialization to the skill, four-year averages are considered as it represents a compromise between the capability of partially remove the unpredictable interannual variability in near-term dynamical forecasting (e.g. the link to ENSO), and the ability to partially represent skill evolution along the forecast time [García-Serrano and Doblas-Reyes, 2012]. Figure 1adepicts the observed AMV spatial pattern based on the correlation map of four-year averaged annual SST anomalies onto the AMV index for the common verification period 1966/69–2006/09. The AMV signature is highlighted by two well-established regions with positive correlation superior to 0.6 that extend over the North Atlantic subpolar and subtropical latitudes, respectively [e.g.,Sutton and Hodson, 2005; Knight et al., 2005; Trenberth and Shea, 2006; van Oldenborgh et al., 2012; Ottera et al., 2010]. The time series of the associated AMV index illustrates how low-frequency North Atlantic SST varies on a multi-decadal timescale (Figure S1 inText S1). The observed AMV index displayed a negative phase from the late 1960s to the early 1990s, and then changed towards a positive phase that continues up to date.
Figure 1bshows multi-annual prediction skill, in terms of correlation between ensemble-mean forecasts and observations for the period from 1966/69 to 2006/09, of the AMV index for both initialized (DePreSys, purple) and uninitialized (NoAssim, pink) decadal re-forecasts, as well as for the four empirical methods. The worst performances are those from the regression-based statistical model (blue) and the ten-year persistence (black dotted). The models based on one- (black solid) and five-year (black dashed) persistence yield positive correlations along the whole forecast range. However, none of them has a skill high enough to be statistically significant. NoAssim has comparable skill to the statistical models in the first part of the re-forecast. Later in the forecast time, NoAssim shows statistically significant positive correlations. A monotonic increase of the AMV skill with forecast time is noticeable for NoAssim. Regardless of its origin, this result suggests a non-negligible contribution to the multi-annual AMV prediction skill from the external forcing. The initialized experiment yields significantly skillful predictions of the AMV index over the entire re-forecast period, albeit at the end of the range the skill becomes undistinguishable from that of NoAssim. This suggests that predictability associated with initial conditions has been probably vanished. DePreSys shows that the correlation decreases with forecast time. The decay ranges from 0.8 in the first 1–4 years to 0.6 in the last four-year average (6–9 years). The degradation in the AMV multi-year skill has been found, albeit not discussed, in recent studies with the MIROC decadal predictions [Mochizuki et al., 2012], CMIP5 decadal hindcasts [Kim et al., 2012], the ENSEMBLES decadal re-forecasts [van Oldenborgh et al., 2012; García-Serrano and Doblas-Reyes, 2012], and different initialization strategies with the ECMWF decadal forecast system (S. Corti, personal communication, 2012).
4. Skill in AMV-Related North Atlantic SSTs
 This section investigates regions in the North Atlantic basin that potentially contribute to improved AMV prediction skill in DePreSys when compared to NoAssim. Rather than assessing the ensemble-mean temperature skill at the grid-point level [Doblas-Reyes et al., 2010; Smith et al., 2010], the aim is to gain insight into the spatial pattern of the AMV forecast skill. The investigation is performed by computing at each grid-point the correlation coefficient between ensemble-mean re-forecast SST anomalies and the observed AMV index (Figure 2). This methodology seeks to find regions accounting for multi-annual skill in observed AMV low-frequency variations. These correlation-based skill maps compare the DePreSys and NoAssim AMV-related SST patterns. Comparison with observations (Figure 1a) is shown in Figure S2 in Text S1. This auxiliary figure shows that the AMV signature in both experiments is remarkably similar, although the correlations in DePreSys are generally higher than in NoAssim. This suggests that the AMV signal is more representative of the regional SST variations in the initialized re-forecasts. Note also the unrealistic anticorrelation between the AMV phase and SST anomalies along the Gulf Stream in NoAssim, and how the initialization corrects this error (cf.Figures 1a and S2).
 DePreSys re-forecasts (Figure 2, left) has maximum correlations at subpolar latitudes along the entire forecast range, although the correlation patterns project onto the observed horseshoe-like AMV signature (Figure 1a) for the 1–4 and 2–5 forecast averages. The largest significant correlations in NoAssim (Figure 2, middle) are at both sides of Greenland, with a particular strong signal over the Labrador Sea. Significant correlations appear also around Iceland, especially to the south of the Greenland-Scotland ridge, but also over the Irminger Sea and Fram Strait. The correlation difference between DePreSys and NoAssim (Figure 2, right) suggests that initialization is critical for predicting the correct evolution of the observed AMV-related SST anomalies over the northern North Atlantic along the whole forecast range, but with special emphasis during the first part of the decadal re-forecast (up to 4–7 years). Note that due to the low number of effective degrees of freedom in the observed AMV index, very few grid points of the DePreSys-NoAssim correlation differences are statistically significant in the 1–4 forecast average; areas with open circles inFigure 2 stand for differences larger than 0.5. The larger correlation differences between the 1–4 and 4–7 forecast averages are found southeast of Greenland, where the observed Atlantic subpolar gyre is located. The correlation differences along the Gulf Stream are due to small positive correlations in DePreSys (in agreement with observations; Figure 1a) and, essentially, to negative ones in NoAssim. The latter are associated with the error of NoAssim in its AMV pattern, in which the AMV signal is unrealistically out of phase with SST variations along the Gulf Stream (Figure S2 in Text S1). Hence, in this decadal prediction system, the initialization is also able to correct this failure in the model performance.
 The domain defining the North Atlantic SST average in the AMV index was next subdivided into the northern (AMV-SP) and southern (AMV-ST) parts, according to the branches of the AMV pattern (Figure 1a and section 2). Autocorrelation function analysis was firstly performed, as the autocorrelation timescale is related to the timescale of persistence decay. Figure 1cshows the autocorrelation function of the observed AMV (bars), AMV-SP (solid) and AMV-ST (dashed) indices in the period 1900–2009, which includes the training period for the regression-based statistical model and the verification period for the skill assessment. The autocorrelation function of the AMV and AMV-SP indices are very similar and correspond to a process with long-term memory, in which ocean dynamics appears to play a larger role than damping, as shown below. The autocorrelation function decays rapidly as the lag increases for the AMV-ST. The same findings are obtained in the observations when different sub-periods over the last century, including the one used to assess the forecast quality, are analyzed (Figure S3 inText S1) and in the decadal re-forecasts (Figure S4 inText S1). This contrasting behaviour between higher latitudes and the tropics is consistent with previous predictability studies based on perfect model experiments [Pohlmann et al., 2004; Branstator and Teng, 2010], potentially predictable variance fraction [Boer and Lambert, 2008], and long control integrations [Hawkins and Sutton, 2009; Branstator et al., 2012].
Figures 1d and 1edisplay the AMV-SP and AMV-ST ensemble-mean correlation for the dynamical and statistical predictions, respectively. Except for DePreSys at the first forecast average (years 1–4), no prediction is skillful in capturing the evolution of the AMV-ST index, and a clear decrease of about 0.4 in correlation is found for this initialized forecast system (Figure 1e). A very different result emerges for AMV-SP, where DePreSys shows almost a constant correlation skill of around 0.8 for all forecast ranges (Figure 1d). This finding leads to speculate that the distinctive skill degradation of the DePreSys AMV predictions with forecast time (Figure 1b) is likely associated with the inclusion of subtropical latitudes in the SST area average. Further support to this reasoning is provided by the notably distinct autocorrelation function between AMV-SP and AMV-ST (Figure 1c). Finally, it is worth highlighting that positive correlation scores of NoAssim AMV-SP along the entire re-forecast suggest an important contribution of varying radiative forcing for the northern North Atlantic low-frequency SST variability prediction skill. This is consistent with the ability of NoAssim to capture the evolution of AMV-related SST anomalies over deep convection regions at both sides of Greenland found above (Figure 2).
5. Summary and Conclusions
 Decadal prediction aims to explore the benefits of initializing coupled models, mainly adding contemporaneous information about the upper-ocean heat content, to achieve forecast quality beyond that provided by the externally forced signal and to improve model performance by correcting errors due to climate change commitment [Meehl et al., 2009; Murphy et al., 2010; Solomon et al., 2011]. Results presented in this study support these premises for the low-frequency SST variability associated with the AMV. For instance, initialization in the forecast system considered here appears to advance prediction of the AMV-related SST anomalies in the Gulf Stream with respect to the uninitialized re-forecasts.
 The multi-annual prediction skill assessment suggests that initialization has the largest improvement, with respect to the varying radiative forced component, over the area in which the observed Atlantic subpolar gyre is located. This finding is consistent with recent studies showing skillful predictions of AMOC variations in initialized forecast systems, in which the subpolar gyre dynamics is important [Yeager et al., 2012; Robson et al., 2012; Pohlmann et al., submitted manuscript, 2012]. Moreover, the northern branch of the observed AMV pattern, which extends over subpolar latitudes (AMV-SP), has been found to show a skill pattern similar to that of the AMV index, while the subtropical index (AMV-ST) has a much lower skill. This result translated into statistically significant skill of the AMV-SP index in the one- and five-year persistence empirical models for the average of the first four forecast years. Likewise, the main initialized forecast system considered here (DePreSys) yielded skillful predictions of AMV-SP in the whole forecast range, outperforming damped persistence, with correlations of ∼0.8. The comparison of AMV-SP and AMV-ST also suggested that the apparent skill decay of the AMV in initialized decadal predictions could be linked to the inclusion of subtropical latitudes in the SST average. This conclusion is robust after sub-sampling the DePreSys decadal re-forecasts with five-year intervals between start dates, as considered in the CMIP5 experimental setup, and for the ENSEMBLES multi-model re-forecasts (Figure S5 inText S1). Nevertheless, a further assessment of this finding will be provided by the CMIP5 multi-model decadal hindcasts, which include prescribed solar variations and volcanic aerosols.
 The uninitialized forecast system considered here (NoAssim) yielded positive correlation coefficients larger than 0.3 for the AMV, AMV-SP, and AMV-ST indices. This prediction skill may come from the aerosol scheme in HadCM3 that factors in direct and first indirect effects of sulphate radiative forcing [Doblas-Reyes et al., 2010], physics which has been recently suggested to contribute to North Atlantic SST variability [Booth et al., 2012]. Although the intriguing increase of skill with forecast time of the three indices in NoAssim appears to come just from properties of the ensemble-mean (Figure S6 inText S1) [García-Serrano and Doblas-Reyes, 2012], the impact of the varying radiative forcing is undeniable. This externally-forced component showed skill in recapturing AMV-related SST anomalies over the Labrador and Irminger convection regions. These areas have shown to be primary agents on the AMOC variability [e.g.,Ortega et al., 2011]. Thus, our findings are consistent with the idea that, although initializing the internal variability is important, the varying radiative forcing can substantially influence the low-frequency ocean variability and predictability in the North Atlantic [e.g.,Meehl et al., 2007; Guemas and Salas-Mélia, 2008; Ortega et al., 2012].
 The authors wish to thank Virginie Guemas (IC3, Spain) and Masazaku Yoshimori (AORI, Japan) for useful discussions. We also thank the anonymous reviewers for their constructive comments. This study was supported by the Spanish MICINN-funded RUCSS project (CGL2010-20657) and the European Union's (EU) FP7-funded QWeCI project (ENV-2009-1-243964). CASC was supported by Conselho Nacional de Desenvolvimento Científico e Tencológico (CNPq) process 306664/2010-0 and the EU FP7-funded CLARIS-LPB project under grant agreement 212492. Technical support at Climate Forecasting Unit (IC3) is gratefully acknowledged.
 The Editor thanks the two anonymous reviewers for their assistance in evaluating this paper.