The present study offers for the first time the validation of decadal prediction systems upon the West African monsoon (WAM) variability. The ENSEMBLES multimodel and perturbed parameter decadal reforecasts are used to assess multiyear prediction skill for the dominant WAM precipitation regimes. Thus, the focus of the assessment is on time scales longer than seasonal to interannual. To retain lower-frequency predictability (interannual to decadal), a 4 year average is applied, which indeed has been shown to remove most of the interannual variability that is unpredictable beyond 1 year in dynamical forecasting (e.g., El Niño–Southern Oscillation). First, the decadal hindcasts are analyzed to assess forecast skill of Guinean and Sahelian area-averaged rainfall indices. Findings suggest that there is no significant skill in predicting these rainfall indices, probably due to the distinctive representations of deep tropical convection in each forecast system. This is further addressed by computing and comparing the leading modes of WAM variability in the ENSEMBLES decadal reforecasts against observations. Results show that while in the observations, global warming has an important role, in the forecast systems, the Atlantic Ocean is the main player. The Atlantic Niño represents the leading forcing for the simulated Guinean precipitation. Sea surface temperature (SST) anomalies associated with the simulated Sahelian precipitation project onto the Atlantic multidecadal variability (AMV), in which the subtropical branch shows consistency across the forecast systems. No significant skill has been found, however, to predict these WAM precipitation modes, although the Sahelian pattern presents systematic positive correlation scores and lower root mean square errors along the whole forecast range. This is reflected in a tendency for reproducing the Sahel dry period around the 1980s. Likewise, the good performance across the models in simulating the relationship between the leading rainfall modes and the surrounding SST forcings points out encouraging prospects for decadal forecasting. Previous studies show multiyear prediction skill of the AMV in the ENSEMBLES decadal reforecasts. Here the skill of the Atlantic-3 SST index is discussed.
 Decadal prediction aims to explore the benefits of initializing coupled models, mainly prescribing the upper ocean heat content, for getting prediction skill beyond the externally forced trend. The time horizon for decadal prediction is a 10 to 30 year period, bridging seasonal forecasting and climate-change projections, and with the challenge of formulating trustworthy multiyear predictions. Decadal forecasts are motivated by the evidence that current climate models can, to a certain degree, capture not only the impact of changing atmospheric composition but also the evolution of slow natural variations of the climate system [Meehl et al., 2009; Murphy et al., 2010; Solomon et al., 2011]. In the next few decades, internal climate variations, particularly at regional scales, are expected to have similar amplitude compared to regional expressions of the anthropogenically forced global warming [Hawkins and Sutton, 2009], which justifies the attempt to predict the associated part of the climate variability. The target process of this study is the West African monsoon (WAM). The understanding and predictability of the WAM activity are fundamental for countries in West Africa, whose economies are mostly based on rainfall-fed agriculture, which makes them especially vulnerable to climate variability [Sultan et al., 2005]. The availability of actionable climate information for the WAM system at interannual to decadal time scales is, thus, of major relevance.
 The WAM is a boreal summer monsoonal system sensitive to both local forcing and remote influences [e.g., Folland et al., 1986; Fontaine and Janicot, 1996; Fontaine et al., 1998]. The WAM variability spans a wide range of time scales from intraseasonal [e.g., Sultan et al., 2003] to interdecadal [e.g., Janicot et al., 2001]. The interannual variability of the WAM is represented by changes in precipitation over coastal regions of the Gulf of Guinea, for which the equatorial Atlantic sea surface temperature (SST) variability or Atlantic Niño is the main oceanic forcing [Janicot et al., 1998; Vizy and Cook, 2001; Okumura and Xie, 2004; Losada et al., 2010]. The decadal variability of the WAM is well captured by low-frequency rainfall fluctuations over the semiarid African Sahel, the southern edge of the Sahara [Folland et al., 1986; Nicholson, 1993; Fontaine et al., 1995]. The Sahelian rainfall variability is related to contrasting patterns of SST anomalies on a near-global scale, projecting onto an interhemispheric signature [Folland et al., 1986; Palmer, 1986; Rowell et al., 1995; Fontaine et al., 1998]. As part of this interhemispheric SST pattern, the main oceanic forcings of the Sahel precipitation are the Indian Ocean decadal variability [Giannini et al., 2003, 2005; Bader and Latif, 2003], the Atlantic multidecadal oscillation or (hereafter) Atlantic multidecadal variability (AMV) [Hoerling et al., 2006; Mohino et al., 2011], and the Pacific decadal oscillation or the basinwide interdecadal Pacific oscillation (IPO) [Joly, 2008; Mohino et al., 2011]. Significant contributions to decadal trends of the Sahelian anomalous rainfall seem to be also externally forced, where anthropogenic aerosol loadings [Held et al., 2005; Biasutti and Giannini, 2006] and greenhouse gases [Haarsma et al., 2005; Lu and Delworth, 2005; Biasutti et al., 2008] are the key players.
 The Guinean precipitation and the Sahelian rainfall account for most of the SST-forced WAM variability at interannual to decadal time scales. When climate models are forced with observed SSTs, they successfully reproduce the observed interannual Guinean and decadal Sahelian rainfall variabilities [Giannini et al., 2003, 2005; Moron et al., 2003; Lu and Delworth, 2005; Tippet and Giannini, 2006]. Thus, the SST forcing can be considered as the dominant driver of the WAM rainfall variability. However, it is not the unique factor impacting rainfall in this monsoonal region for at least three reasons [Fontaine et al., 2011]: (i) atmospheric internal variability contributes strongly to driving the simulated precipitation variability at decadal to multidecadal time scales [Caminade and Terray, 2010]; (ii) land-surface vegetation processes and dust feedbacks may amplify rainfall anomalies [Biasutti et al., 2008]; and (iii) global warming impacts both multidecadal SST variability and monsoonal circulation [Paeth and Hense, 2004]. The challenge to correctly simulate the WAM rainfall interannual to decadal variability with coupled models is, thus, particularly complex because of the competence of all those physical mechanisms above. A clear example of this comes from the assessment of climate change projections for the WAM, for which no consensus has emerged regarding the impact of anticipated greenhouse gas forcing on the hydrology of the Sahel in the second half of the 21st century [Hulme et al., 2001; Druyan, 2010]. Caminade and Terray  note the wide range of contradicting outcomes for African rainfall trends toward the end of the 21st century. Cook and Vizy , likewise, point out that many global climate models in the third phase of the Couple Model Intercomparison Project (CMIP3) simulate flawed representations of the WAM climate. Also, conclusively, Biasutti et al.  find that evidence for any projection of WAM rainfall is uncertain. However, as decadal prediction represents a joint problem of initial and boundary conditions [Meehl et al., 2009], it may reveal potentially predictable components of the Guinean and Sahelian precipitation regimes from initializing with the observed, contemporary climate state. Results that hold across a variety of models are desirable given the imperfection of the models. In this study, the multiyear prediction skill of the WAM rainfall has been assessed by employing a multimodel ensemble and a perturbed-parameter ensemble that allows addressing several model-dependent conclusions as well as some problems of model uncertainty.
 Focusing on the model performance for the 20th century, the studies by Joly et al.  and Joly and Voldoire [2009b] evaluate how state-of-the-art climate models in CMIP3 simulate the relationship between tropical-extratropical SSTs and the WAM. They treat separately the high-frequency (i.e., interannual) and low-frequency (i.e., multidecadal) variabilities. Their results confirm that the WAM precipitation is significantly connected to regional and global SST anomalies on both time scales. However, in most of the CMIP3 simulations, the interannual variability of SST is very weak in the Gulf of Guinea (the Atlantic Niño), especially along the Guinean coast. As a consequence, the influence on the monsoon rainfall over the African continent is hardly reproduced [Joly and Voldoire, 2009b]. Joly et al.  emphasize the models' difficulty in simulating the response of the local intertropical convergence zone (ITCZ) to Atlantic SST anomalies. The influence of El Niño–Southern Oscillation (ENSO) on the WAM is also barely simulated by CMIP3 coupled models [Joly and Voldoire, 2009a]. Concerning the low frequency, only a few models capture some features of the Sahelian rainfall and its relation to the well-known interhemispheric SST pattern at decadal/multidecadal time scales. All these errors in model simulations suggest that additional research is required to better assess the relative roles of the ocean in driving the WAM rainfall variability. This is particularly important for decadal prediction since multiyear forecast skill relies on, among other things, the successful representation of the SST forcings at interannual to decadal time scales, which are supposed to be included in the initialization from the ocean state. The present study evaluates how the decadal forecast systems included in the ENSEMBLES project simulate and predict the dominant WAM rainfall regimes, assessing the skill both for area-averaged precipitation indices representative of the Guinean and Sahelian rainfall and for the leading precipitation variability modes derived from a principal component analysis/empirical orthogonal function (PCA/EOF). Thus, the approach adopted here complements that one in Joly and Voldoire [2009a, 2009b] and Joly et al.  regarding the performance assessment of the WAM in current coupled climate models. On the other hand, the present study offers for the first time the validation of decadal prediction systems upon the WAM variability.
2 Data Sets and Methods
 The study analyze sets of 10 year climate retrospective forecasts, also known as decadal reforecasts or hindcasts, which were produced as part of the European Union-funded ENSEMBLES project [Doblas-Reyes et al., 2010]. The experimental setup is at the heart of the experimental design of the decadal prediction component of the ongoing Fifth Coupled Model Intercomparison experiment (CMIP5), which will contribute to the next Intergovernmental Panel on Climate Change Assessment Report (AR5). The use of the ENSEMBLES decadal reforecasts allows addressing several model-dependent conclusions. Two contributions addressing the problem of model uncertainty, a multimodel and a perturbed-parameter ensemble, are used. The ENSEMBLES multimodel reforecasts consist of 10 year long ensemble dynamical forecasts initialized once every 5 years over the period 1960–2005 (i.e., 1960, 1965 …), have three members per model, and start on 1 November of each start date. The multimodel ensemble was produced by four European research centers: the European Centre of Medium-Range Weather Forecasts (ECMWF, UK), the Met Office-Hadley Centre (UKMO, UK; with the HadGEM2 climate model), the Leibnitz Institute of Marine Sciences at Kiel University (IFM-GEOMAR, Germany), and the Centre for Research and Training in Advanced Scientific Computing (CERFACS, France). The perturbed-parameter ensemble is known as Met Office Decadal Climate Prediction System (DePreSys) [Smith et al., 2007, 2010] and was run using a nine-member ensemble of HadCM3 model variants. In order to assess the impact of initialization, two sets of decadal reforecasts were run with and without initializing the contemporary state of the climate system, these reforecasts will be referred to as DePreSys and NoAssim, respectively.
 These decadal integrations aim at exploring some indication of regional decadal predictability beyond the slow and relatively predictable warming of the planet, thereby opening the possibility of forecasting low-frequency internal climate variability. The objective of this research is to evaluate the skill of the WAM rainfall predictions and the contribution to the skill by the initialization of the variability. Three reference data sets for precipitation have been used, namely, the gridded-gauge land-surface rainfall from the Global Precipitation Climatology Centre (GPCC) [Rudolf and Schneider, 2005], the new Climatic Research Unit (CRU) TS3.1 land-only rainfall data set up to 2009 [Mitchell and Jones, 2005], and satellite-gauge estimates from the Global Precipitation Climatology Project (GPCP) [Adler et al., 2003]. We show results mainly from GPCC, but it is important to notice that the skill assessment is not sensitive to the choice of the observational, verification data set. The period of the study is 1961–2009, and the seasonal average is from July through September (JAS), which corresponds to the heart of the WAM rainy season.
 In order to characterize the WAM rainfall variability and assess its prediction skill, principal component analysis/empirical orthogonal function (PCA/EOF) [von Storch and Zwiers, 2001] of the JAS precipitation anomalies is employed. The domain used for the PCA is the region 10°S–30°N/20°W–30°E. PCA provides a set of spatial patterns (empirical orthogonal functions, EOFs) and associated standardized time series (principal components). The information associated with each PCA mode is completed by the corresponding fraction of explained variance (fvar). The PCA results are presented in terms of homogeneous regression maps and heterogeneous correlation maps for precipitation and surface temperature, respectively. Surface temperature of the decadal reforecasts includes SST over sea and soil temperature over land in all predictions except in UKMO, for which only SST is available. The observational SST data set used in the work is the National Oceanic and Atmospheric Administration extended reconstructed SST v3b (ERSST) [Smith et al., 2008]. For a given forecast time, model seasonal anomalies are calculated by subtracting the corresponding model seasonal climatology along the actual time (start date dimension). According to recent guidance papers [Goddard et al., 2012; Meehl et al., 2013], which identify the forecast time averaging that has an actionable information scope, a 4 year average is performed upon these drift-corrected anomalies in order to retain interannual to decadal predictability. Four year averages are considered as it represents a compromise between the capability of partially removing the unpredictable interannual variability in near-term dynamical forecasting (e.g., the link to ENSO) and the ability to partially represent skill evolution along the forecast time [García-Serrano and Doblas-Reyes, 2012]. It is to note that the skill assessment discussed in the work is not strongly affected by the degree of averaging. The subset of observational data used to estimate the respective climatology is selected following a per-pair method [García-Serrano and Doblas-Reyes, 2012], taking into account only years when both observational and model data are available, instead of using the full record. In this case, a separate observational climatology is estimated for each forecast time. Thus, and according to observational availability, the first 4 forecasting years have 10 values along the actual time following the 5 year interval between start dates; the next 5 forecasting years (fifth to ninth) include nine values along the actual time; and finally, the last forecasting year (tenth) just eight. For instance, the first year of reforecast includes predictions for the years 1961, 1966, 1971, 1976, 1981, 1986, 1991, 1996, 2001, and 2006, whereas the ninth includes 1969, 1974, 1979, 1984, 1989, 1994, 1999, 2004, and 2009.
 The PCA approach selected to evaluate the reproducibility and skill of the WAM in the ENSEMBLES decadal reforecasts is identical to that used by Philippon et al.  to evaluate the performance of the ENSEMBLES seasonal hindcasts for forecasting the WAM at the monthly time scale. PCA is applied separately to each individual model. Considering each individual model separately enables to take into account the way each model reproduces the WAM. This approach follows the “Pmod” methodology proposed by Doblas-Reyes et al. . The ensemble members of each forecast system, three members in the multimodel ensemble and nine different versions in the perturbed-parameter ensemble, are concatenated in the time dimension before computing the corresponding PCA. Regarding the spatial dimension, both land-only and land-ocean rainfall subsets have been considered. Land-only model precipitation yielded odd and scarce EOF patterns and nonconsistent SST signatures, except for CERFACS, DePreSys, and NoAssim. Full-field rainfall EOF patterns, including data over the ocean, are thus shown for the forecast systems contributing to ENSEMBLES to better illustrate the differences in precipitation between the two leading modes, as well as the model performance of tropical convection.
 The statistical significance for multiyear prediction skill is assessed throughout the study with a one-tailed t test for positive, different from zero correlations at 0.05 confidence level. The effective degrees of freedom for the latter are calculated by taking into account the autocorrelation of the corresponding observed time series [von Storch and Zwiers, 2001; Zieba, 2010]. This leads to different confidence intervals (gray shading in the plots), instead of a straight line, in the course of the forecast time and also for the particular phenomenon considered. A different estimation of the degrees of freedom for the model correlation maps, associated with the precipitation EOFs, has been adopted. To facilitate the comparison among the forecast systems, i.e., DePreSys and NoAssim with the multimodel (as they have different ensemble members), an effective number of spatial degrees of freedom has been calculated from the observational PCA; this estimation depends only on the partitioning of the variance between the EOFs [Bretherton et al., 1999]. This approach has led to a common statistical threshold for correlations of 0.52 at 0.05 confidence level with a two-tailed t test. Thus, one can easily compare the strength of the linear relationship between the rainfall variability modes and its SST signatures all through the forecast systems and along the forecast time.
3.1 Rainfall Indices
 The forecast quality assessment of the WAM in the ENSEMBLES multimodel and perturbed-parameter decadal reforecasts is analyzed by using two different indices: (i) the Guinean rainfall (hereafter GUI), precipitation anomalies averaged over 5°N–10°N/10°W–10°E; and (ii) the Sahelian rainfall (hereafter SAH), precipitation anomalies averaged over 10°N–18°N/15°W–15°E. Notice that only land points are used in the computation.
 Figure 1 shows the ensemble-mean anomaly correlation coefficient between the single forecast systems contributing to the ENSEMBLES multimodel (colored thin lines), the multimodel ensemble-mean (MME; thick black), DePreSys (thick purple), and NoAssim (thick pink) against JAS GPCC precipitation for the GUI (a) and SAH (b) rainfall indices. The correlations are shown for each 4 year average in the forecast time. No significant multiyear prediction skill is found for any of the rainfall indices, although it is clear that SAH time series perform better with the ENSEMBLES multimodel showing systematically positive correlations along the whole forecast time (Figure 1b, thick black). CERFACS yields significant scores for SAH at the end of the reforecast. This fact, together with the increase of correlation for IFM-GEOMAR, drive the MME to be at the edge of the statistical significance in the forecast averages 6–9 and 7–10. However, the limited sampling of start dates prevents from any conclusion about predictability based on model-dependent behaviors. From the comparison between initialized (DePreSys, thick purple) and uninitialized (NoAssim, thick pink) decadal reforecasts, it is found that there is no multiyear skill in forecasting the WAM rainfall using the radiative forcing changes due to greenhouse gases, solar irradiance, and anthropogenic aerosols (i.e., boundary conditions) nor from internal variability (e.g., ocean). The coherence between the MME and DePreSys ensembles supports that conclusion.
 One could argue that the prediction skill for interannual to decadal variability in the WAM rainfall is limited by the distinctive representation of ITCZ-related deep convection in each forecast system, reflecting the actual limitations in current coupled models [e.g., Cook and Vizy, 2006]. This is further addressed in the next section. Here an attempt is made to illustrate some systematic errors of the coupled models, which could be used to improve their forcings and initialization procedures to give more skilful predictions over the WAM domain. Figure 2 shows the bias of each forecast system in the course of the forecast time, or models' drift, for the GUI (top) and SAH (bottom) rainfall indices. ECMWF and CERFACS overestimate the GUI precipitation, DePreSys and NoAssim (i.e., HadCM3) and IFM-GEOMAR underestimate it, whereas UKMO (i.e., HadGEM2) show almost no bias. Only CERFACS yields a marked, drying drift for GUI during the first 4 forecasting years. Concerning the SAH rainfall, UKMO and CERFACS overestimate the amount of precipitation, whereas ECMWF and IFM-GEOMAR underestimate it. DePreSys and NoAssim show no bias. In this case, only ECMWF yields a marked, wetting drift that gets the model closer to observations during the second half of the reforecast.
 Figure 3 depicts the precipitation systematic error of the coupled models but in its spatial distribution. To better illustrate the performance of the local tropical convection, data over both land and ocean have been used. The observational reference data set in this case is GPCP and the time period covered 1979–2009. Note, however, that the results over land are consistent between GPCP and GPCC in the common period (not shown). For a better comparison with the EOF patterns discussed in the following section, this bias assessment employs 4 year forecast averages. Hence, the first forecast period (1–4 years) compares observations and decadal reforecasts with start dates from 1980 to 2005, whereas the rest of the forecast averages uses start dates from 1975 to 2000. In this way, having six data points along the actual time albeit overlapping most of the observational record, the systematic errors in Figure 3 may suffer from large uncertainties but still give a useful picture. To complement this approach, Figure 4 shows the drift with forecast time of the SST annual cycle in the Gulf of Guinea, averaged over 5°S–5°N/15°W–10°E [Joly and Voldoire, 2009b], which is a key component of the WAM coupled system. As expected from Figure 2, no clear evolution of the precipitation systematic error with forecast time has been found after applying a 4 year average; for that reason, biases for only two target forecast periods are shown in Figure 3 (middle and right columns), whereas the estimated mean seasonal rainfall is shown only for the first 1–4 years (left column). This contrasts with some evolutions with forecast time of model SST annual cycle; for that reason, an intermediate forecast average is also shown in Figure 4b (3–6 years). Focused on JAS, Figure 4 points out a slight warming drift in UKMO but clearer cooling drifts in CERFACS and ECMWF, being the latter the only system in which a response in the ITCZ can be elucidated with a slight northward displacement of the positive precipitation bias (Figure 3, top). These results suggest that land-sea interactions and soil-moisture processes can also be at work in shaping the precipitation systematic errors of the forecast systems. On its part, IFM-GEOMAR, DePreSys, and NoAssim show no drift in the Gulf of Guinea SST annual cycle (Figure 4), likely due to its anomaly initialization strategy [e.g., Doblas-Reyes et al., 2010].
 Consistent with Figure 2, CERFACS is the forecast system with the largest overestimation of rainfall over the Sahel (Figure 3, third row), while ECMWF clearly underestimates it along its longitudinal belt (Figure 3, first row). However, both, CERFACS and ECMWF, show positive precipitation biases in the Gulf of Guinea coastline [cf. Figure 2]. As also shown in Figure 2, UKMO, DePreSys, and NoAssim are the forecast systems with the lowest deviations from the observed WAM rainfall (Figure 3), although they all have positive SST biases in the equatorial Atlantic during JAS (Figure 4). The precipitation deficit IFM-GEOMAR shows for both GUI and SAH indices (Figure 2) appears to be due to a too southward location of the ITCZ, which goes along with the strong positive SST bias in the Gulf of Guinea during JAS (Figure 4) and translates into a clear overestimation of the tropical convection over the sea and a dry systematic error over the eastern Sahel (Figure 3). Interestingly, a common feature of the rainfall systematic error in all forecast systems considered is the apparent underestimation of the monsoonal precipitation in the westernmost part of the continent, the region extending over Senegal, Guinea, and Liberia.
 Finally, it is worth noting that there is no correspondence between a better WAM representation (lesser drift/bias) and a higher skill. Particularly evident are, for instance, the case of DePreSys, which shows no bias but still has poor skill, and CERFCAS, which shows strong biases but instead has the highest correlation for the SAH rainfall index [cf. Figures 1b and 2, bottom].
3.2 Rainfall Variability Modes
 As the regional average indices show limited skill, the forecast quality assessment of the WAM is carried out also by computing the dominant modes of WAM variability. According to what has been described in the previous section, this complementary approach may fit better to individual performance of tropical convection in each forecast system. Principal component analyses/empirical orthogonal function (PCA/EOF) have been performed upon GPCC, CRU, and GPCP observational data sets (Figures 5, 6) and ENSEMBLES reforecasts (Figures 7-12). The results reveal distinct representations of the WAM in different global climate models, although common features have emerged. To avoid repetition but showing all relevant aspects of the decadal prediction systems, only three target forecast averages are shown in the following: 1–4, 3–6, and 6–9 years.
 Although the leading modes of observed WAM variability have been widely documented elsewhere, there are some aspects specific of a forecasting context that deserve an adequate description. The first GPCC mode is a zonal dipole-like anomalous pattern between eastern and western parts of the Guinean coastline (fvar = 25.5%) and corresponds to the global warming signature [Mohino et al., 2011]. The second GPCC leading mode is associated with the Sahelian mode (fvar = 11.9%) and a global scale SST pattern, which regionally projects onto the AMV- and IPO-related SST anomalies (as introduced in section 1). The third GPCC leading mode is tightly related to the Guinean rainfall (fvar = 10.4%) and the Atlantic Niño SST anomaly (Figure 5). The robustness of these results is assessed against the CRU dominant modes. The first CRU mode corresponds to the Sahelian rainfall (fvar = 20%), and the second CRU mode is associated with the Guinean precipitation (fvar = 14%). The associated SST patterns show the well-known interhemispheric pattern that includes the AMV in the Atlantic and the IPO in the Pacific, and the Atlantic Niño, respectively (not shown). However, there is no mode in CRU related to global warming (Figure 6). This finding is intriguing; however, the analysis of the differences between both sets based on land gauge data is beyond the scope of this study. Nonetheless, it is interesting to note that regression analyses using both GPCC [Smith et al., 2012] and CRU [Ting et al., 2009; Mohino et al., 2011] reveal a very consistent pattern associated with global warming as that shown here in Figure 5 (GPCC EOF1). Trends analysis based on other observational data sets suggest, likewise, a fingerprint of the long-term climate change over the western tropical Africa [Hoerling et al., 2006]. All these evidence indicate that the anthropogenically forced signal over the Guinean coastline is a robust feature of the WAM in recent decades [Ting et al., 2009]. If global warming represents a major influence on the WAM system, this should emerge even in a shorter, but more accurate, observational data set as GPCP. This appears to be the case. The leading GPCP precipitation mode (fvar = 28.3%; not shown) projects onto the first GPCC mode, although the negative anomalies in the corner of the bay of Benguela barely come out. The associated SST anomalies show a warming globally as in the GPCC mode, and the time series depicts an increasing trend (Figure 6a, gray). The offset between the GPCC and GPCP time series is due to the respective, different climatological period. The correlation between the GPCC and GPCP PCs related to the global warming is 0.74 using yearly data and 0.89 with a 4 year average. The subsequent GPCP modes consistently correspond to the Sahelian (fvar = 23%) and Guinean (fvar = 13%) rainfall patterns. Because of the limited sampling in decadal reforecasts, we use in the following just the GPCC principal components, which overlap most of the period analyzed (to the detriment of GPCP). On their part, the correlation between the Sahelian PCs in GPCC and CRU is 0.91 in yearly JAS means and 0.92 when a 4 year average is applied to the time series. The correlation for the Guinean rainfall between these observational data sets is 0.79 based on annual means and 0.72 with a 4 year average (Figure 6). All these results lead us to choose GPCC as reference data set for the comparison against the ENSEMBLES decadal reforecasts described below. However, it is worth noting that the skill assessment discussed in the work is not sensitive to the choice of observational data set; thereby, no significant skill at interannual to decadal time scales in predicting the Guinean and Sahelian rainfall regimes is found.
 The Atlantic Niño is the main driving SST pattern in all forecast systems (Figures 7-12) except in CERFACS, for which no Guinean rainfall mode appears (Figure 9). The ECMWF leading mode shows the dominance of the Atlantic Niño along the whole forecast time, with a correct location of the ITCZ in the Gulf of Guinea that brings rain over the coastline (Figure 7). In the UKMO, the correlation of the Guinean rainfall with the Atlantic Niño becomes stronger as the forecast time increases, although the SST signature is almost isolated from the first forecast period, 1–4 years. The deep convection yields wetter conditions inland the African continent than in ECMWF and so closer to the observations (Figure 8). The IFM-GEOMAR leading mode is linked to an SST pattern over the whole tropical band, with increasing amplitude in the Atlantic Niño area along the forecast time. The reader will note that maximum loadings are over the Angola/Benguela upwelling system. The precipitation pattern is located too far off the Guinean coast yielding even negative rainfall anomalies over land (Figure 10). This feature can be caused by a too southerly position of the ITCZ, in a consistent view with its systematic errors described above (Figures 2-4). DePreSys (Figure 11) and NoAssim (Figure 12) both show a clear and isolated Atlantic Niño SST pattern during the whole reforecast, but with larger correlation scores in the initialized hindcast (DePreSys). The corresponding leading precipitation modes strongly project onto the observed Guinean rainfall pattern.
 The Sahelian rainfall represents the second leading mode in all forecast systems, except in CERFACS for which it is the dominant one (Figure 9). As for the Guinean rainfall mode, different models have the precipitation anomalies in different areas. This feature is in close agreement with their distinct representation of the tropical convection (Figure 3). However, it is worth noting how across the variety of coupled models the SST pattern associated with the Sahelian mode correctly projects onto the Atlantic signature of the observed interhemispheric SST signal (Figure 5; see also Introduction), showing in most of the cases a clear AMV-like pattern. In the ECMWF system, the Sahel-related SST signature shows significant correlations off Newfoundland almost during the whole forecast time, displaying even the AMV-like teleconnection to the Mediterranean basin, although the maximum loadings are in the subtropical North Atlantic (Figure 7). The ECMWF forecast system also shows significant SST anomalies in the Pacific basin, which resemble the IPO signal, from 2–5 to 6–9 averages. In the UKMO system, the prevalence of the subtropical North Atlantic SST pattern is clearer (Figure 8). In this forecast system, significant SST anomalies also appear over the Indian Ocean from 3 to 6 years period onward. The CERFACS system leading precipitation pattern is associated with an extratropical forcing mainly located over the Mediterranean area, likely related to the AMV (Figure 9). The signal of the AMV is apparent during the first three forecast averages (1–4 to 3–6). For later forecast periods, the forcing is over the eastern Sahara; the signal reminds the Sahara heat low that Haarsma et al.  and Biasutti et al.  have indicated as one of the main drivers of the Sahelian rainfall. Land-atmosphere interactions could also play at work driving long-term predictability [Giannini et al., 2003, 2005]. IFM-GEOMAR yields a very consistent pattern reminiscent of the AMV signal along the whole forecast time, with larger correlations in the middle of the reforecast from 3–6 to 5–8 years (Figure 10). Concomitant with this Atlantic signature, significant SST anomalies are also present in the Pacific basin projecting onto the IPO. This forecast system also suggests significant contributions from the Sahara heat low and land-surface feedback to Sahelian rainfall predictability. Again, DePreSys and NoAssim show similar SST signatures all along the forecast time, projecting onto an AMV-like pattern. In this case, the initialization in DePreSys gets a distinctive signal in the northern North Atlantic in comparison with NoAssim whose loadings are in the subtropical region [cf. Figures 11 and 12].
 The previous results indicate that SST anomalies associated with the simulated Sahelian precipitation project onto the AMV signature, whose subtropical branch shows consistency across the forecast systems. This result is consistent with previous predictability studies addressing the AMV teleconnection to the WAM [e.g., Knight et al., 2006; Zhang and Delworth, 2006; Ting et al., 2009]. Actually, the AMV has been suggested to be a descriptor/predictor of the WAM-Sahelian rainfall [e.g., Mohino et al., 2011; van Oldenborgh et al., 2012]. To illustrate how the decadal forecast systems considered in this work represent the AMV-WAM relationship, Figure 13 shows the AMV pattern and its associated rainfall anomalies for each system. As can be seen, each individual simulated AMV-WAM teleconnection rightly resembles the Sahelian-EOF mode found previously for the corresponding prediction system (Figures 7-12). Different regions of AMV impact in each system are likely related to their specific ITCZ configurations (Figure 3).
 None of the leading precipitation EOFs in the models is significantly associated with the observed EOF mode related to global warming, which instead has a dominant role in GPCC and GPCP (Figures 14c and 14d). The lack of reproducibility of an isolated rainfall mode related to the simulated global warming in the forecast systems might be caused by the common precipitation systematic error where the observed signal has its largest impact, i.e., the region extending over Senegal, Guinea, and Liberia [cf. Figure 3 and Figure 5]. However, the relationship between the simulated Sahelian patterns and the observed mode related to global warming shows positive correlation scores practically all along the forecast time (Figure 14d). As shown for the rainfall indices (Figure 1), no significant multiyear prediction skill is found for any of the precipitation regimes (Figures 14a and 14b), although it is again clear that the Sahelian mode is better predicted showing systematically positive correlations with the observed pattern along the forecast time.
 To further illustrate the better forecast quality of the decadal prediction systems upon Sahelian rainfall variability, in comparison with the Guinean mode, Figure 15 depicts the ensemble mean root mean square error (RMSE) between the simulated patterns and observed precipitation regimes. Gray shading stands for values above RMSEs of predictions based on climatology, which correspond to the standard deviation of the GPCC principal components at each 4 year forecast average. It is apparent the more consistent behavior of the simulated Sahelian modes, as well as its lesser overestimation of the climatological RMSE (Figures 15a and 15b). Note also the better performance of DePreSys (purple) with respect to NoAssim (pink), with lower RMSE practically along the whole forecast range (Figure 15b), which goes along with higher anomaly correlation coefficients shown by the initialized system in Figure 14b.
 The accuracy of the reforecasts measured by the RMSE, however, does not lead to any conclusion as far as predictability is concerned, since the error growth does not show a systematic behavior with forecast time. This may reflect difficulties coming from the low start date frequency in ENSEMBLES, which can affect the estimates due to sampling issues. In addition, it is to note that decadal reforecasts are intended to evaluate actual skill of the phenomenon considered instead of quantifying its predictability [Goddard et al., 2012; Meehl et al., 2013]. Target predictability studies for the WAM have been reported in Introduction.
 A well-configured ensemble prediction system should be nondispersive. A widely used indicator for this is that the RMSE of the ensemble mean is of a magnitude similar to the spread; thereby, a smaller (larger) spread is called underdispersive (overdispersive) behavior [e.g. Hagedorn et al., 2005]. It is found that whereas the forecast systems are generally underdispersive for the Guinean rainfall mode, they tend to match RMSE values with its spread for the Sahelian pattern. The latter indicates that the ENSEMBLES forecast systems are reliable, in a probabilistic sense, when recapturing the Sahelian precipitation variability at interannual to decadal time scales.
 However, although reliability is desirable in ensemble prediction systems, it does not necessarily translate into predictive skill. A diagnostic is finally performed to assess whether the ENSEMBLES systems are able to reproduce the Sahel dry period that took place around the 1980s (Figure 6b), which was one of the strongest interdecadal signals observed during the 20th century at global scale [e.g., Rodríguez-Fonseca et al., 2011]. Figure 16 compares the observed climatological value (solid black) and models mean climate (solid colored circles) of the SAH index over 1961–2009 with the averages over the target period for the Sahel dryness (dashed black for GPCC; open colored circles for ENSEMBLES). Three start dates are selected for this target period in order to represent the drying and core of the dry period, i.e., 1970, 1975, and 1980. Note that the evolution of the models mean climate and targeted averages follow the models' drift described in Figure 2. Although it is not as marked as in observations, it is quite noticeable the tendency in all the ENSEMBLES multimodel reforecasts (ECMWF, UKMO, CERFRACS, IFM-GEOMAR) to yield drier conditions during the target period than over the complete set of start dates. Concerning the ENSEMBLES perturbed-parameter reforecasts, it can be seen that NoAssim tends to present a drying during the first half of the forecast range (until 3–6 years), but only DePreSys shows drier conditions all along the forecast time with respect to its mean climate. All these results in representing (not predicting) a Sahel dry period around the 1980s are encouraging; however, this does not mean that the forecast systems would have skill at the interannual to decadal time scale, it would be a prerequisite to have positive correlations when estimating the skill over a long period (Figures 1b and 14b).
4 Summary and Discussion
 Climate variations linked to the West African monsoon (WAM) have shown to be largely affected by both internal, natural variability related to sea surface temperature and recent trends associated with global warming [e.g., Mohino et al., 2011]. The objectives of this study are (i) to describe the characteristics of monsoonal rainfall at multiannual time scales and (ii) to assess its forecast quality with current forecast systems. The ENSEMBLES multimodel and perturbed-parameter decadal reforecasts have been used to assess multiyear forecast skill of two area-averaged precipitation indices that are representative of the Guinean and Sahelian rainfall regimes. The results suggest that there is no significant skill in predicting the precipitation indices, in which the main cause appears to be model deficiencies in simulating the local tropical convection. This result comes to the point that accurate simulations of rainfall represent a great challenge for atmospheric models [Moron et al., 2003, 2004]. Current state-of-the-art coupled models are still unable to capture the main modes of SST-WAM rainfall covariability [Joly et al., 2007]. However, our results, beyond the biases in the mean state of the simulated WAM, suggest that there are two well-defined phenomena acting in practically all the forecast systems analyzed here. These have been identified after applying principal component analysis/empirical orthogonal function (PCA/EOF) upon precipitation to the ENSEMBLES decadal reforecasts. The leading precipitation EOF corresponds in general to the Guinean rainfall and is associated with the Atlantic Niño SST pattern. The second precipitation EOF (except for CERFACS, for which it is the first) is related to the Sahelian rainfall. It is associated predominantly with SST anomalies in the North Atlantic reminiscent in most of the models to the AMV signature. Statistically significant SST anomalies related to Sahel precipitation also appear in other oceans, but the North Atlantic signal is common across the forecast systems. These results are consistent with previous findings obtained using a dynamical approach that suggest the importance of Gulf of Guinea SST and the North Atlantic in WAM variability at interannual to decadal time scales (see Introduction).
 Concerning the Guinean rainfall-Atlantic Niño SST relationship, some remarks can be made. First, it is well known that the Atlantic Niño represents interannual variability of the summer cold tongue in the equatorial Atlantic [e.g., Polo et al., 2008]. However, after applying a 4 year average along the forecast time, which would remove a large part of the interannual variability [García-Serrano and Doblas-Reyes, 2012], the Atlantic Niño is still the dominant oceanic forcing of the WAM in the models. This result should encourage the climate forecasting community to keep working on the improvement of the coupled model performance in the tropical Atlantic.
 Contrary to what is found by Joly and Voldoire [2009b] that the simulated Atlantic Niño SST mode in CMIP3 is not necessarily associated with a clear rainfall response, our results show that ENSEMBLES decadal forecast systems correctly capture the observed Atlantic Niño-WAM link. Note that our study is based on WAM rainfall variability, instead of on SST variability, which may better represent the local precipitation. Nevertheless, here has been also evidenced the bias in Atlantic tropical deep-convection and equatorial SST in most of the models. This agrees with Huang et al.  who emphasize the strong systematic errors in simulating the SST and low cloud cover in the southeastern tropical Atlantic and the equatorial thermocline in the Gulf of Guinea. It is well known that the SST error is a major cause of the incorrect position of the simulated local ITCZ.
 Regardless of the lack of accuracy of the WAM-ITCZ, the skill assessment we have presented here indicates that, until now, there is no significant skill in reforecasting the Guinean rainfall regime (Figures 1a, 14a, and 14c). However, given the marked dominance of the Atlantic Niño in the ENSEMBLES systems, skillful multiyear predictions of the Atlantic Niño could lead to valuable predictability sources for the WAM activity. An attempt to shed some light on this is given in Figure 17, which depicts the spectrum of frequencies for the Atlantic Niño-3 SST index (ATL3; SST anomalies averaged over 20°W–0°E/3°S–3°N) [Zebiak, 1993]. It can be seen that the ATL3 index has two prominent peaks at decadal time scale, namely, around 9 and 13 years, which represent good targets for decadal prediction. Figure 18a shows the ensemble-mean anomaly correlation coefficient with a 4 year average between the ENSEMBLES forecast systems and ERSST. This figure points out that both initialized ensembles, the multimodel (MME, thick black) and perturbed-parameter (DePreSys, thick purple) ensembles, yield skillful predictions of ATL3 almost all along the forecast time. The fact that the uninitialized ensemble (NoAssim, thick pink) shows also significant correlation scores during the second part of the reforecast suggests that there is an important contribution to the skill from the externally forced variability. Figure 18b shows the same skill assessment but after removing the trend, estimated here as the global average of SST anomalies over 60°S–60°N (mimicking Trenberth and Shea's, 2006 definition for the AMV index). The skill of ATL3 drops drastically when compared with Figure 18a, but there are still some hints of skill, with positive correlations in almost all initialized systems during the first part of the reforecast. IFM-GEOMAR appears to degrade the skill in the ENSEMBLES multimodel. Interestingly, DePreSys yields statistically significantly different scores than those from NoAssim during the forecast averages 2–5 and 3–6 (purple symbols). These additional results highlight promising prospects of near-term prediction for the Atlantic Niño, and thus for the WAM rainfall. This could be attained if, for instance, improvements in initialization and/or assimilation strategies are achieved. The ongoing CMIP5 decadal experiments could represent an excellent framework to continue evaluating these encouraging findings. The ATL3 multiyear prediction skill after detrending the time series is a research topic that definitely deserves further research. Mechanisms behind that predictability could be low-frequency variations of the thermocline depth related to subtropical gyres and AMOC changes [Haarsma et al., 2008; Tokinaga and Xie, 2011; Wen et al., 2011]. This will be the focus of a future work by the authors.
 The dominant SST phenomenon that has appeared in our analysis to drive the Sahelian rainfall corresponds to the AMV. This result is consistent with previous evidences showing that the WAM-Sahelian precipitation is one of the most significant teleconnections of the AMV (Figure 13) [Knight et al., 2006; Zhang and Delworth, 2006; Ting et al., 2009; van Oldenborgh et al., 2012]. It is also in close agreement with previous works suggesting that interdecadal changes of the WAM rainy season have a strong link with AMOC-related thermohaline circulation [Chang et al., 2008], as the AMV is thought to be related to multidecadal variations of the AMOC [Knight et al., 2005; Dijkstra et al., 2006; Pohlmann et al., 2009]. However, the AMV SST low-frequency variability seems to not be solely driven by changes in the AMOC, and attempts have been made to identify from observations [Trenberth and Shea, 2006; Guan and Nigam, 2009] and simulations [Ottera et al., 2010; Booth et al., 2012] the part of the signal that is linked to more global changes and external forcings. Previous results suggest that the AMOC-related internal variability drives subpolar SSTs, while subtropical SSTs are largely controlled by radiative changes [Ottera et al., 2010; van Oldenborgh et al., 2012]. This debate is open from the analysis presented here since the Sahelian rainfall mode in the ENSEMBLES decadal reforecasts are equally correlated with the GPCC Sahelian mode and the observed pattern associated with a global warming signal (GPCC EOF1; Figures 14b and 14d). Note that no significant skill has been found in this dominant WAM mode, but the correlations are positive all along the forecast time and across the different prediction systems. The SST signatures in the North Atlantic related to the simulated Sahelian rainfall do not help to address this question either, as they alternatively depict statistically significant anomalies at northern and subtropical latitudes. Although it is noticeable that the subtropical branch of the associated SST pattern shows a bit more consistency across the forecast systems (Figures 7-12).
 The comparison between the initialized (DePreSys; Figure 11) and uninitialized (NoAssim; Figure 12) decadal reforecasts reveals that, effectively, subtropical SSTs associated with the Sahelian rainfall are common to both systems, and that SSTs in the northern North Atlantic are only present in the initialized one. This finding indicates that though rainfall is still a chief hurdle in state-of-the-art coupled models, the initialization of decadal forecasts can lead to a better representation of the oceanic forcings of the WAM precipitation. Indeed, García-Serrano and Doblas-Reyes  and García-Serrano et al.  have found that the AMV has a discernible predictive skill up to 3–6 years ahead when reforecasts are initialized from observations with respect to when they are conducted just with radiative changes. This conclusion appears to be consistent among the ENSEMBLES multimodel and DePreSys. This added skill by initialization in reforecasting the AMV during boreal summer may lead to skillful predictions of rainfall in the WAM at those lead times. It is also important to notice here the model ability to simulate the AMV teleconnection to the Mediterranean basin (Figures 7-12) [Sutton and Hodson, 2005; van Oldenborgh et al., 2009, 2012]. Finally, our results also suggest that the Sahara heat low and land-atmosphere feedbacks associated with the Sahelian rainfall may be relevant to near-term predictions of the WAM activity. Particularly in CERFACS and IFM-GEOMAR, but also in DePreSys and NoAssim although to a lesser extent, the simulated Sahelian mode projects onto a coherent signature of surface temperature and precipitation anomalies at longer forecast times, which pleads for attention to the low-level Saharan low [Haarsma et al., 2005; Biasutti et al., 2009] and soil moisture processes [Giannini et al., 2003, 2005; Kucharski et al., 2013].
 This study was supported by the Spanish MICINN-funded RUCSS project (CGL2010-20657) and the European Union's FP7-funded QWeCI project (ENV-2009-1-243964). The authors acknowledge the helpful comments from three anonymous reviewers. Technical support at Climate Forecasting Unit (IC3) is gratefully acknowledged. I. Polo has been supported by a postdoctoral fellowship funded by the Spanish Government.