This study assesses the CMIP5 decadal hindcast/forecast simulations of seven state-of-the-art ocean-atmosphere coupled models. Each decadal prediction consists of simulations over a 10 year period each of which are initialized every five years from climate states of 1960/1961 to 2005/2006. Most of the models overestimate trends, whereby the models predict less warming or even cooling in the earlier decades compared to observations and too much warming in recent decades. All models show high prediction skill for surface temperature over the Indian, North Atlantic and western Pacific Oceans where the externally forced component and low-frequency climate variability is dominant. However, low prediction skill is found over the equatorial and North Pacific Ocean. The Atlantic Multidecadal Oscillation (AMO) index is predicted in most of the models with significant skill, while the Pacific Decadal Oscillation (PDO) index shows relatively low predictive skill. The multi-model ensemble has in general better-forecast quality than the single-model systems for global mean surface temperature, AMO and PDO.
If you can't find a tool you're looking for, please click the link at the top of the page to "Go to old article view". Alternatively, view our Knowledge Base articles for additional help. Your feedback is important to us, so please let us know if you have comments or ideas for improvement.
 The prediction of decadal climate variability against a background of global warming is one of the most important and challenging tasks in climate science. Not only does natural variability have a large-amplitude influence over broad regions of the globe, it is an integral component of climate variability that modulates low-frequency climate phenomena as well as extreme climate events such as tropical cyclone activity. On decadal timescales, some aspects of internal climate variability may be predictable [Collins and Allen, 2002; Smith et al., 2007; Keenlyside et al., 2008; Meehl et al., 2009, 2010; Pohlmann et al., 2009; Mochizuki et al., 2012]. However, the actual prediction skill of natural climate variability on decadal timescales using various current climate models has received little attention [van Oldenborgh et al., 2012].
 The Coupled Model Intercomparison Project Phase 5 (CMIP5) has devised an innovative experimental design to assess the predictability and prediction skill on decadal time scales of state-of-the-art climate models, in support of the Intergovernmental Panel on Climate Change (IPCC) 5th Assessment Report [Taylor et al., 2012]. The decadal predictability and prediction skill of individual models have been analyzed separately for multi-year prediction horizons over different time periods and regions [Pohlmann et al., 2009; Fyfe et al., 2011; Chikamoto et al., 2012; Mochizuki et al., 2012]. However, the CMIP5 decadal predictions from different models have not been evaluated and compared using the same evaluation matrix. The choice of one model over the other, or the use of sets of models in a multi-model ensemble (MME), requires information that compares the predictions of individual models. Here, we compare the ability of currently available CMIP5 decadal hindcasts to simulate the mean climate and decadal climate variability from individual coupled models and a multi-model ensemble. We focus on the surface temperature and two dominant internal climate modes: the Atlantic Multidecadal Oscillation (AMO) and Pacific Decadal Oscillation (PDO). This study addresses how well the CMIP5 multi-model decadal hindcasts simulate the spatio-temporal climate variability.
2. Data and Models
 This study compares the CMIP5 decadal hindcasts and forecasts conducted by seven modeling centers. Each decadal prediction includes at least 3 and as many as 10 ensemble members generated by slightly different initial conditions. The data consists of simulations over 10 year period that are initialized every five years during the period 1960/1961 to 2005/2006 [Taylor et al., 2012]. A brief summary of each model's experimental configuration is presented in Table S1 (see Text S1 in the auxiliary material). The annual mean refers to the average from January to December for each year.
 Surface temperature data from the ERA40 [Uppala et al., 2005] before 1979 and from the ERA Interim [Berrisford et al., 2009] after 1979 are used to evaluate the predictions. The Extended Reconstructed Sea Surface Temperature Version 3 (ERSST.v3b) [Smith et al., 2008] is used to define the PDO and AMO indices. All data from model hindcasts and observations are interpolated to horizontal resolution as 2.8125° longitude and latitude. For the observations, the long-term mean is removed by subtracting climatological means for the entire period from 1960 to 2010. The model forecast anomaly is calculated as , where Yis the ensemble-average prediction,Y′ is the anomaly of the raw forecast with respect to the forecast average , j is the starting year (n = 1, 2, …, 10) and τ is the forecast lead year. is calculated as in the period only when observational data is available. The equally weighted average from total 52 ensemble members of seven hindcast experiments provides the values for the MME.
3. Prediction Skill Assessment for CMIP5 Decadal Hindcasts
3.1. Global Surface Temperature
 The model prediction skill is examined by comparing the annual mean surface temperature from the observation and hindcasts of each model. Figure 1 shows the evolution of the annual global surface temperature (Yjτ) from the reanalysis from 1960 to 2010 and ensemble mean of each model 10 year hindcast/forecast from 1961 to 2015 (initialized every 5 years). Most of the models simulate lower global mean surface temperature than the reanalysis during the entire period, except MIROC4h (Figure 1d), MIROC5 (Figure 1e) and CFSv2 (Figure 1g). Several models, including CanCM4 (Figure 1b) and CNRM (Figure 1c), which are initialized close to the observed state (full field initialization), drift towards the model climate. Models that are initialized with anomaly assimilation (anomaly initialization) also show climate drift during predictions (MIROC4h, MIROC5 and MRI: Figures 1d–1f, respectively). Most of the models appear to overestimate the 1960–2010 trend in their hindcasts. Removing the average along the actual time (Y′jτ) leads to a skewed outcome, whereby the MME (Figure S1 in Text S1) and each models (not shown) predict less warming or even cooling compared to the reanalysis in the earlier decades, and too much warming in recent decades. Figure 2 shows the observed and predicted trend (slope in a linear regression) in Y′jτ as a function of forecast lead time (τ) after applying a four-year average to filter out the high frequency variability. The observed trend is calculated in the same manner of the trend in hindcasts. The systematic overestimation in the trend throughout the integration period is obvious in all hindcasts except CFSv2.
 To examine the prediction skill of the individual models and also the MME, the average for the lead times of 1-year, 2–5 year and 6–9 year mean surface temperature anomaly field (Y′jτ) is compared with reanalysis (Figure 3 and Figure S2 in Text S1). We measure the prediction skill in terms of the anomaly correlation coefficients:
where the O's are the observed field. The correlation coefficients are calculated over the ensemble mean for the hindcasts of each models. The results are almost the same if the observed anomaly is calculated based on the climatology Ōτ.
 All models show high skill (greater than the 95% confidence level) in forecasting surface temperature anomalies over the Indian, North Atlantic and the western Pacific Oceans up to 6–9 years (Figure 3 and Figure S2 in Text S1). However, the equatorial Pacific and North Pacific Ocean regions show less prediction skill after 2–5 years (Figure 3). For 6–9 years, the predictive skill does not change much compared to 2–5 years (Figure S2 in Text S1). The relatively long prediction skill appears over the region where the externally forced component and low-frequency climate variability is dominant [Keenlyside et al., 2008; Meehl et al., 2009; Pohlmann et al., 2009; Chikamoto et al., 2012, Mochizuki et al., 2012; Oldenborgh et al., 2012]. High prediction skill also occurs in the tropical Atlantic SST which is an important factor in climate variability in that region and beyond [Keenlyside et al., 2008]. Comparing the globally averaged skill for each model's hindcasts shows the highest skill occurring for the MME over the entire period (not shown). The relatively low skill in MIROC4h and CFSv2 is possibly due to the smaller number of ensemble members (i.e. 3 and 4 members, respectively) compared to other models (i.e. 6–10 members) as larger number of ensemble members generally results in representing a higher skill in the ensemble mean.
3.2. Decadal Climate Variability
 To examine the prediction skill of natural internal modes of climate variability, the simulation of AMO and PDO indices is compared with observations. The AMO and PDO are the dominant decadal oscillations over the North Atlantic Ocean [Schlesinger and Ramankutty 1994; Enfield et al., 2001] and North Pacific Ocean [Mantua et al., 1997], respectively, and are the most predictable components of internal climate variability [e.g., Keenlyside et al., 2008; Mochizuki et al., 2010]. The AMO index is defined as the area averaged annual mean sea surface temperature (SST) anomaly averaged over the North Atlantic from 80°W to 20°W and from 0° to 70°N for both the simulations and observation. The simulated PDO index is defined as the normalized time series based upon projections of predicted annual mean SST anomaly in the North Pacific Ocean poleward of 20°N, onto the leading EOF spatial pattern from the observed annual mean SST anomaly. Both in the observations and model hindcasts, the SST anomalies are detrended before calculating these indices to remove the externally-forced variation [Oldenborgh et al., 2012]. A four-year running average is applied to both indices to filter out higher interannual frequencies. Figure S3 inText S1 shows the variation of the AMO and PDO indices from observations and the MME hindcast. Both indices show strong decadal variability. The gray shades in Figure S3 represent the ranges of one standard deviation of the ensemble mean in each hindcast.
 The predictive skill for the AMO and PDO index is measured by correlation coefficient and root-mean-square error (RMSE) between the simulations and observation.Figure 4shows the correlation coefficient as a function of lead-time for the MME and the ensemble mean of individual models. For representing confidence limits of significance, the correlations and RMSE of the persistence prediction are included (Figure 4 and Figure S4 in Text S1). Horizontal lines in each figure represent the confidence level (Figure 4) and observed standard deviation (Figure S4 in Text S1), respectively. For the AMO prediction, the correlation coefficients and RMSE of almost all models represent significant skills (Figure 4a and Figure S4a in Text S1). After 1–4 years, the MME, HadCM3, CNRM and MIROC4h show greater skill than the persistence prediction. After 3–6 years, most of the models have greater and significant skill than the persistence prediction (high correlation than persistence and smaller RMSE than the persistence and observed amplitude). The MME represents more skillful results than most of the individual model predictions over the entire prediction period.
 The prediction skill in PDO index is lower than AMO, in agreement with recent studies [Oldenborgh et al., 2012]. The correlation coefficient of the PDO index shows predictive skill over 90% confidence level in MME and CanCM4 for 1–4 and 2–5 years. CanCM4 remains being above 90% during 3–6 years and MIROC5 is far above 95% for 3–6 years. The MME shows a decrease in skill for lead times beyond 3–6 years (Figure 4b). The correlation coefficients of almost all models represent insignificant skills for the PDO index over the entire period. The correlation coefficient is less than the persistence prediction and the errors of all models are larger than the observed PDO amplitude. The MME shows more skillful results than most of the individual model predictions.
 We have assessed the CMIP5 decadal hindcast/forecast simulation performance of seven state-of-the-art ocean-atmosphere coupled models. Most of the models produce cooler than observed global mean temperature during the entire period and overestimate the observed trend in their hindcasts. All models show high prediction skill for surface temperature up to 6–9 years over the Indian Ocean, the North Atlantic and the western Pacific Oceans, while showing lower predictive skill over the equatorial Pacific and North Pacific Ocean. The AMO index is relatively well predicted in all models for the entire prediction period with a significant skill, while the predictive skill for the PDO index is relatively low for the entire period.
 Although the MME does not outperform all of the constituent models for every forecast skill metric, it has in general better forecast quality than the single models for global mean temperature, AMO and PDO. This study partly supports the utility of the multi-model ensemble approach in overcoming the systematic model biases from individual models and in enhancing decadal predictability. It should be noted that not all modeling centers have thus far released their decadal predictions for CMIP5. Additional intercomparison will be conducted when the other CMIP5 simulations are made available.
 The constructive comments of two anonymous reviewers are greatly appreciated. This research has been supported by the National Sciences Foundation under award AGS 1125261, AGS 0965610 and APEC Climate Center.
 The Editor thanks the two anonymous reviewers for their assistance in evaluating this paper.