While the North Pacific region has a strong influence on North American and Asian climate, it is also the area with the worst performance in several state-of-the-art decadal climate predictions in terms of correlation and root mean square error scores. The failure to represent two major warm sea surface temperature events occurring around 1963 and 1968 largely contributes to this poor skill. The magnitude of these events competes with the largest observed temperature anomalies in the twenty-first century that might be associated with the long-term warming. Understanding the causes of these major warm events is thus of primary concern to improve prediction of North Pacific, North American and Asian climate. The 1963 warm event stemmed from the propagation of a warm ocean heat content anomaly along the Kuroshio-Oyashio extension. The 1968 warm event originated from the upward transfer of a warm water mass centered at 200 m depth. For being associated with long-lived ocean heat content anomalies, we expect those events to be, at least partially, predictable. Biases in ocean mixing processes present in many climate prediction models seem to explain the inability to predict these two major events. Such currently unpredictable warm events, if occurring again in the next decade, would substantially enhance the effect of long-term warming in the region.
 Contrary to previous studies which focused on a single model set of climate predictions, our analyses rely on an wide ensemble of climate forecasting systems comprising the four involved in the ENSEMBLES project [Doblas-Reyes et al., 2010], the perturbed-parameter nine-member Met Office Decadal Climate Prediction System [Smith et al., 2007, 2010], and five of the participants to the CMIP5 project [Taylor et al., 2012]. We thus analyze from the very first decadal prediction attempts to the very latest ones, which use various initialization techniques and ensemble generation methods. All those state-of-the-art dynamical forecast system exhibit consistently weak performance in predicting the North Pacific multiannual variability as we illustrate inSection 3 after describing those systems together with the observational data sets and the analysis methods in Section 2. This failure raises the question whether the multiannual climate variability in this region is fundamentally unpredictable or if this variability is driven by potentially predictable mechanisms that the current generation of climate models is unable to capture. To investigate the reasons for this particularly low skill, we identify and describe, in Section 4, two major warm events which are consistently missed by every climate forecast system. In Sections 5 and 6, we show that different mechanisms are responsible for the two events, based on an extensive set of eleven observational data sets. Section 7returns to the climate predictions to attempt to explain why they exhibit such low skill in the North Pacific and points to the representation of ocean mixing processes as the most probable responsible weakness of the current generation of climate models, though those ocean mixing biases might be linked to uncertainties in the ocean-atmosphere fluxes.Sections 8 and 9 respectively provide a discussion and conclusions.
2. Data and Methods
2.1. The Forecast Systems
 We assess the performance of the current generation of climate forecast systems from three multimodel ensembles, the characteristics of which are summarized in Table 1:
Table 1. Summary Table of the Ensembles of Decadal Hindcasts Used in This Articlea
The second column provides the list of models included in the ensemble. The third column provides the initialization technique. More details are provided in Section 2.
Assimilation in the coupled model of anomalies from ERA40 and ERAint reanalyses and ocean observations
Assimilation in the coupled model of gridded ocean subsurface observations of T and S
Assimilation in the coupled model of ocean anomalies of gridded subsurface observations of T and S
Assimilation in the coupled model of ocean anomalies of gridded subsurface observations of T and S
Full field initialization from NEMOVAR-ORAS4 ocean reanalysis and ERAint (before 1989) and ERA40 (after 1989) land/atmosphere reanalysis
9 variants of HadCM3
Nudging of anomalies in horizontal winds, atmospheric temperature, surface pressure, ocean temperature and salinity + online flux correction
full field initialization
full field initialization
full field initialization
Nudging of observed SST anomalies in the coupled model
 1. The first one comprises contributions to the CMIP5 project [Taylor et al., 2012] produced with the HadCM3 [Gordon et al., 2000; Pope et al., 2000], the MRI-CGCM3 [Yukimoto et al., 2001], the MIROC4, the MIROC5 [Hasumi and Emori, 2004] and the EC-Earth v2 [Hazeleger et al., 2010] coupled climate models with respectively 10, 9, 3, 6, and 5 ensemble members per start date. These contributions consist of 10-year-long hindcasts initialized from estimates of observed climate states every 5 years over the period 1960–2005. The initialization date varies from November, 1st to the following January, 1st depending on the forecast system. We thus consider that the first forecast year starts in the first January of the hindcast. This multimodel ensemble will be referred to as CMIP5 in the following. For five realizations of the EC-Earth v2 model, the hindcasts also include one start date every year over the 1960–2005 period.
 2. The second is a nine-member perturbed-parameter ensemble consisting of 9 variants of the HadCM3 model [Hawkins and Sutton, 2009; Gordon et al., 2000] within the Decadal Climate Prediction System of the UK Meteorological Office [Smith et al., 2007, 2010]. The variants are obtained by perturbing simultaneously 29 atmosphere and sea-ice parameters [Murphy et al., 2004]. This set of hindcasts will be referred to as DePreSys. Ten-year hindcasts were started every year from 1960 to 2005, on November, 1st using an anomaly initialization technique [Robson, 2010].
 3. The third one comprises contributions to the ENSEMBLES project [Doblas-Reyes et al., 2010] with four different coupled ocean-atmosphere models. The experimental setup is the one later chosen for the decadal prediction exercise of CMIP5 and consists of ten-year long ensemble dynamical hindcasts initialized once every five years over the period 1960–2005, on November, 1st. These hindcasts have three-member ensembles per model.
 Each near-term climate prediction produced with those systems is initialized from an estimate of the observed climate state and takes into account in different ways both natural and anthropogenic changes to the radiative forcing. The reader is referred toTable 1 for more details about the initialization techniques.
 When assessing the forecast system SST skill in Section 3, the model or observation climatology is defined as a function of lead time, by averaging the hindcast SST across the starting dates, using only hindcast values for which observations are available at the corresponding dates. The model climatologies obtained in such a way are then subtracted from each raw hindcast to obtain anomalies over the whole hindcast period. The same method is applied to the observations to obtain anomalies over the whole observational period. The anomalies thus obtained are referred to as “per-pair” anomalies followingGarcia-Serrano and Doblas-Reyes . For example, the computation of the “per-pair” climatologies to compare the CMIP5 ensemble-mean SST to the ERSST one will not take into account the hindcast that starts in 2005 for lead times longer than six years since the ERSST data set ends in December 2011. However, we will be able to compute “per-pair” anomalies for lead times longer than six years for this hindcast by subtracting the climatology computed from the nine preceding hindcasts.
 The hindcast performance is assessed from the bias-corrected “per-pair” anomalies independently of the initialization technique employed. Indeed, up to now, there is no guarantee that anomaly initialization techniques remove initial drifts or shocks [Robson, 2010]. This analysis method avoids spurious disparities in the assessed performance arising from post-processing the data from different forecast systems inconsistently. Hindcast skill is measured either using the anomaly correlation coefficient or Root Mean Square Error (RMSE). The significance level for the correlation skill is computed via a one-sided Student t-test which takes into account the autocorrelation of the time series.
 When considering the observational data sets independently of the forecast systems for process analysis in Sections 4 to 6, the common period to all the data sets, i.e. from January 1958 to June 1994, is used to compute their climatological annual cycle. This annual cycle is then subtracted from the whole data set to obtain the anomalies.
 For plotting purposes, a smoothing is performed as the last step of the data processing with a 12-month running mean inSections 3 to 6and with a 6-month running mean inSection 7.
3. State-of-the-Art Climate Forecast Skill
 In the CMIP5 ensemble, the North Pacific region stands out, along with the Southern Ocean, as the region with the poorest anomaly skill score globally, for hindcasts averaged over the forecast time 2–5 years (Figure 1a). This characteristic appears in each individual forecast system included in the CMIP5 ensemble (Figure S1 in auxiliary material Text S1) as well as the ENSEMBLES and DePreSys ensembles (Figure S2 in auxiliary material Text S1). To compare quantitatively the performance of the CMIP5 hindcasts in the different world oceans, we compute the combined spatial and temporal correlation (Figure 1b) and RMSE (Figure 1c) of the predicted against observed SST anomalies as a function of start date, for each ocean after applying a 12-month running mean.
where , SSTano_obs and SSTano_modstand for the observed and modeled “per-pair” SST anomalies computed as described insection 2, lonmin, lonmax, latmin, latmax correspond to the geographical limits defined for a given region, forecast1 and forecast10 correspond to the first and tenth forecast respectively, and tis the forecast time. Those scores quantify the ability of the CMIP5 hindcasts to reproduce the spatial patterns rather than the basin-averaged yearly SST anomalies as a function of the forecast time. The best performance is found in the Indian Ocean where the correlation reaches about 0.4–0.6 and the RMSE about 0.2–0.3°C. The Southern region consistently has the lowest correlation across the hindcasts but also tends to have the lowest RMSE. The North Pacific Basin correlation is only slightly above the Southern Ocean one and its RMSE tends to be the highest. Accounting for the RMSE and correlation scores, the North Pacific Basin thus appears as the region where the state-of-the-art decadal climate predictions perform the worst worldwide, followed closely by the South Pacific region.
 The forecast time series of ensemble-mean smoothed SST anomalies from each CMIP5 climate forecast system averaged over the region of lowest SST skill (155°E–235°E–10°N–45°N) are shown inFigure 1dwith one color per starting date together with their observed counterparts in black. The multimodel ensemble mean SST anomalies are shown with thick lines. In the forecasts starting between 1970 and 2005, the forecast systems generally follow the warming evolution of the observed anomalies. The ensemble misses though some large excursions from the warming trend like the very sharp one around 1987 and the wider one around 1999 and tends to underestimate the slow down of the warming in the XXIst century except in one of the forecast systems. The performances are generally much poorer in the 1960s. The ensemble-mean prediction from each forecast system misses systematically the two major warm events which peak in 1963 and 1968 (Figures S3 and S4 inauxiliary materialText S1). Even in the hindcasts initialized in November 1961 and 1966 performed with the EC-Earth (Figure S3 inauxiliary material Text S1) and DePreSys (Figure S4 in auxiliary material Text S1) forecast systems, the warm anomalies are damped in less than six months. A few ensemble members do seem able to capture those warm anomalies, but they are largely outnumbered.
 The failure to predict these two major warm events largely contributes to the particularly low skill in SST in the North Pacific region. Not including the 1960s hindcasts increases substantially this skill (Figure 1e and Figures S5 and S6 in auxiliary material Text S1) in the areas where the SST skill is poor (Figure 1a) although it also reduces this skill in some other areas such as the south-eastern North Pacific. Since the forecast system's failure to represent the 1963 and 1968 warm events stands as their most striking failure over the whole hindcast period, we focus in the following on the causes of these warm events and on why the forecast systems fail to capture them.
4. The Major Warm Events of 1963 and 1968
 The major warm events of 1963 and 1968, peaking at 0.3–0.4°C (Figure 2a), are the largest on record. Although they might appear to be small, they compete with the SST anomalies that might be associated with the long-term warming in the recent past in this region. These events are not related to well known modes of variability dominating the North Pacific climate: the PDO, the PNA, and ENSO [Rasmuson and Carpenter, 1982]. The events still appear after filtering out the effect of these modes by a multilinear regression at a range of lags from −1 year to +1 year at the grid point level (Figure 3). Large SST anomalies are associated with each one of these modes at the grid point level. However, as these SST anomalies have opposite signs across the North Pacific, the averaging over the North Pacific Ocean makes their integrated impact relatively small. The warm events still appear also after removing the effects of ENSO and volcanic eruptions [Thompson et al., 2010, Figure 3].
 These two major warm events occur during a period in which the surface heat flux anomalies (155°E–235°E–10°N–45°N) are from the ocean to the atmosphere (Figure 2b). The DS94 turbulent (in blue) and total (in red) heat fluxes are close to one another. The heat exchange between the ocean and the atmosphere is therefore dominated by turbulent heat fluxes. Around 1963 and 1968, the OAFluxes (in green) and DS94 turbulent and total surface heat fluxes show peaks corresponding roughly to the ones observed in the SST time series. This suggests that the ocean anomalies might have forced the atmosphere during these two major warm events. This behavior is not systematic in the region. The correlation between the ERSST anomalies and the DS94 total, the DS94 turbulent and the OAFluxes turbulent flux anomalies reaches respectively 0.5, 0.45 and 0.35. For example, the cooling around 1999 (Figure 2a) also missed by the forecast systems (Figure 1d) coincides with a peak in surface heat fluxes from the ocean toward the atmosphere also (Figure 2b) which suggests that the atmosphere was forcing the ocean anomalies during this event. Note, however that the DS94 and OAFluxes estimates of turbulent heat fluxes disagree in particular during the 1963 event when the peak occurs one year earlier and has a lower amplitude in OAFLuxes than DS94. The OAFluxes data set thus suggests a late contribution of the atmosphere in amplifying the initial warm ocean anomaly.
5. Horizontal Advection of the 1963 Heat Anomaly
 During 1963, ERSST SST anomalies in the study domain (Figure 4a) show a large positive anomaly confined to 180°E–205°E–35°N–45°N and peaking at about 1.5°C while weak anomalies cover the rest of the domain. The Hovmöller diagram (Figure 4b) of SST anomalies averaged in the 35°N–45°N latitude band from November 1960 to December 1963 suggests an eastward propagation of the anomaly at roughly 20° per year. This propagation also appears in Hovmöller diagrams of HadISST SST anomalies, NCEP and ERA40 near-surface air temperature anomalies and DS94 total and turbulent heat flux anomalies (Figure S7 inauxiliary material Text S1). Computation of backward trajectories launched in the 180°E–205°E–35°N–45°N domain using Ariane Lagrangian trajectory software [Blanke et al., 2001; Van Roekel et al., 2009; Getzlaff et al., 2006] confirms that the rate of propagation is consistent with advection of particles along the Kuroshio-Oyashio extension. Some trajectories are illustrated (Figure 4c) for particles launched in April 1963. Most of the particles originate in the 140°E–170°E longitude band which corresponds to the original location of the warm anomaly seen in the Hovmöller diagram (Figure 4b). Note that the initial warm anomaly in the western North Pacific basin might have been triggered by a previous El Niño event since our filtering method in the previous section only considered a 2-year window around an ENSO-peak. However, understanding the origin of the warm anomaly before the date of initialization of the forecast systems is beyond the scope of this article.
 The ocean vertical profiles of the annual temperature anomalies averaged over 180°E–205°E–35°N–45°N (Figure 4d) show an anomalous heat reservoir extending down to roughly 300 meters building up progressively from 1961 to 1963. A cold anomaly is present below this depth but seems not to substantially change in this period. The amplitude of the SST anomaly experiences a seasonal cycle along its propagation with maximum amplitude in late winter (Figure 4b). Superimposed on these seasonal variations, the SST and near-surface air temperature anomalies also seem to increase along their propagation (Figure 4b and Figure S7 in auxiliary materialText S1). The smaller long-term mean mixed layer depth in the central part of the North Pacific relative to the western part (Figure 5a) confines the heat content anomaly to an increasingly thinner layer along its pathway which could favor the increase in SST anomaly.
6. Upward Transfer of the 1968 Heat Anomaly
 During 1968, the SST anomalies (Figure 6a) show a large warm feature of amplitude 0.8°C over 170°E–235°E–10°N–35°N and a secondary peak in the north-western part with large amplitude but much smaller extent that explains only 10% of the total anomaly. The Hovmöller diagram of SST anomalies averaged in the 35°N–45°N latitude band from November 1965 to December 1968 does not show any particular propagative feature (not shown). The different spatial patterns of SST anomalies during 1963 (Figure 4a) and 1968 (Figure 6a) and the lack of propagative feature along the Kuroshio-Oyashio extension from 1965 to 1968 suggest that the 1963 and 1968 events are dynamically distinct. The monthly profiles of temperature anomalies from April 1966 to April 1967 (Figure 6b) show a warm anomaly centered at 200 meter depth at the end of the winter 1966 which persists during the summer below the mixed layer. This warm anomaly is transferred to the top 150 meters by April 1967. Since this upward transfer occurs in an area of large-scale downwelling, it might rather be caused by turbulent mixing processes through the re-emergence mechanism of SST anomalies previously observed in the North Pacific Ocean [Alexander et al., 1999, 2008]. The warm anomaly is later amplified most probably by the atmospheric weather “noise”, which favors a stabilization of the vertical profile (Figure 6b) either through a decrease in wind speed in this region during the years 1967 and 1968 (Figure 6c), or through an Ekman-induced shoaling of the mixed layer (Figure 6d). Note, however, that those two atmospheric data sets bear large uncertainties.
7. Potential Causes for the Forecast Systems to Miss Those Events
 The 1963 and 1968 warm events were associated with long-lived ocean heat content anomalies. Since the 1963 anomaly fed the atmosphere during its propagation along the Kuroshio-Oyashio extension (Figure 2b) and thus persisted against the atmospheric damping, we expect the 1963 large warming to be predictable. As long as the 1968 original deep anomaly is isolated from the surface, we also expect the ocean system to be able to persist it. However, when this anomaly reaches the surface, its amplification is controlled by some dynamical atmospheric weather “noise”. We thus expect the 1968 event to be predictable although not its maximum amplitude.
 Then, which potentially misrepresented processes in the forecast systems might be responsible for their failure in representing those events? A vertical section of temperature anomalies at 146°E in the 35–45°N latitude band in the initial conditions of the climate forecast initialized in November 1960 with the EC-Earth forecast system (Figure 7) shows a warm anomaly extending down to 500 m with meridional extent of 10°N. The warm anomalies peak at 6°C at 75 m while the surface anomaly reaches only 1°C. The monthly sea surface temperature anomalies during the first six months of this climate forecast (Figure 8) show a warm anomaly which seem to travel eastward at an approximate speed of 5°E in 6 months, i.e. slightly slower than the advective timescales (Figure 4b). This anomaly grows weaker and weaker along its propagation path contrary to what occurs in the observation (Figure 4b) and it vanishes after six months although two of the members are able to persist it for about one year and half (not shown). A similar behavior can be observed in all the forecast systems considered in this study: the warm anomaly (Figures S8 and S9 in auxiliary materialText S1) is initially present but vanishes with different e-folding times depending on the forecast system. Note though that the CMIP5 hindcasts have been initialized at different dates between November 1960, 1st and January 1961, 1st. A comparison in January 1961 is thus not a perfectly fair comparison of their performances (Figure S9 inauxiliary materialText S1). The turbulent surface heat flux anomalies in the EC-Earth forecast initialized in November 1960 (Figure 9a) are larger than the observed ones over the core of the initial warm anomaly during the first six months of the hindcast. Those excessively large surface heat flux anomalies contributed to damping this anomaly. However, in the forecasts initialized in 1961 and 1962, the heat flux anomalies tend to be lower than the observed ones but the SST anomalies are also damped in a few months. Though contributing to damping the warm anomaly, errors in surface heat fluxes seem not to be the main cause. The inability of the forecast systems to persist the warm anomaly might rather come from the generalized strong biases in mixed layer depth and turbulent and mesoscale eddy mixing processes in climate models in this region [Lienert et al., 2011] (Figure 5). An inaccurate representation of the warm anomaly in the initial conditions could also contribute to its particularly quick disappearing for some of the forecast systems.
 A horizontal section of temperature anomalies at 200 m depth in the initial conditions of the EC-Earth climate forecast initialized in November 1965 (Figure 10) shows a warm anomaly that peaks at 2.5°C with similar shape than the SST anomaly observed in 1968 (Figure 6a). The simulated turbulent surface heat flux anomalies (Figure 11) are weaker than the observed ones in the forecasts initialized in 1965, 1966 and 1967. The damping of the warm anomaly thus rather comes from oceanic processes. The monthly “per-pair” climatologies in EC-Earth forecast heat content anomaly averaged in the 170–235°E–10–45°N–100–300 m box (Figure 12), where is located the initial heat content anomaly in 1966 shows a strong drift with respect to the NEMOVAR reanalysis from the first summer. An initial transfer of heat content from the intermediate layers toward the surface layers is systematically observed in the forecasts produced with the EC-Earth climate model [Du et al., 2012]. Although the hindcasts have been detrended a-posteriori following the method described insection 2.3, the interaction between the large drift and the superimposed simulated variability stands as a strong obstacle for the forecast system to predict the 1968 warm anomaly.
 To describe the 1963 and 1968 events, we performed our analyses on a set of 11 observational data sets. However, during the 60s, the observations were sparser than they are nowadays. The sea surface temperature observations were frequent in this region thanks to the commercial lines, but few ocean thermodynamic profiles have sampled the deep ocean. The NEMOVAR ocean reanalysis stands as a physical extrapolation of this sparse available information. However, the ocean model on which is based this physical extrapolation shares the same typical biases with the state-of-the-art ocean models included in the forecast systems for which we assessed the performance in this article. Indeed, the ocean heat budget we performed to investigate the mechanisms explaining the 1963 and 1968 events are not closed. The assimilation term constitutes a substantial contribution allowing the warm anomaly to persist along its propagation. The sparse observational coverage, the biases in the ocean model used for their physical extrapolation and the uncertainty in the atmospheric forcing fluxes are at the basis of the uncertainty on the processes we pointed as to be involved in the development of the 1963 and 1968 events. However, those observational data sets will be highly challenging to improve for this particular period. A more accurate assessment of the mechanisms leading to the 1963 and 1968 events will require the ocean models to improve in such a way that the assimilation increments become substantially smaller than the other terms of the heat budget.
 In this work, we have used a wide variety of climate forecast systems to show that the North Pacific region is the area where the state-of-the-art decadal climate predictions of sea surface temperature (SST) perform the worst worldwide for forecast times ranging from the second to the fifth year, according to correlation and RMSE (Root Mean Square Error) measures. This systematic error is dominated by the models' inability to capture two major warm events in the 1960s. Based on an extensive set of 11 observational data sets, we investigated the mechanisms explaining those large warm events. We suggest that the 1963 one stemmed from the propagation of a warm anomaly along the Kuroshio-Oyashio extension. The 1968 warm event originated from the upward transfer of a warm water mass centered at 200 m depth. Over the whole hindcast period in the framework of the ENSEMBLES and CMIP5 project, those two large warm events are unique and extreme. Their magnitudes compete with the largest observed temperature anomalies in the twenty-first century that might be associated with the long-term warming. We show that the initial warm anomaly vanishes in every forecast system and hypothesize that the generalized model biases in ocean mixing processes might be responsible for damping the associated heat content anomalies, though those ocean mixing biases might be linked to uncertainties in the ocean-atmosphere fluxes. Accurately representing the ocean mixing processes in the ocean general circulation models is a priority since the occurrence of such a warm event in the next decade could jeopardize climate prediction in the North Pacific as attempted byMochizuki et al. . Although reducing systematic biases in ocean stratification and improving the representation of ocean mixing processes has been a long-standing effort, our conclusions suggest that resources devoted to improving simulation of ocean mixing has the potential to significantly improve decadal climate prediction.
 Joan Ballester is greatly acknowledged for interesting discussions about our results. This work was supported by the EU-funded QWeCI (FP7-ENV-2009-1-243964), CLIM-RUN (FP7-ENV-2010-1-265192), the MICINN-funded RUCSS (CGL2010-20657) projects and the Catalan Government. The authors wish to thank the three reviewers for their fruitful suggestions. The authors thankfully acknowledge the computer resources, technical expertise and assistance provided by the Red Española de Supercomputación (RES).