We introduce an improved initialization to the decadal predictions performed for the Mittelfristige Klimaprognosen (MiKlip) project based on the Max-Planck-Institute Earth System Model and furthermore test the effect of increased ocean and atmosphere model resolutions. The new initialization includes both a more sophisticated oceanic initialization and additionally an atmospheric initialization. We compare the performance of retrospective decadal forecasts over the past 50 years with that of the previous system. The new oceanic initialization considerably improves the performance in terms of surface air temperature over the tropical oceans on the 2–5 years time scale, which also helps to improve the predictive skill of global mean surface air temperature on this time scale. The higher model resolution improves the predictive skill of surface air temperature over the tropical Pacific even further. Through the newly introduced atmospheric initialization, the quasi-biennial oscillation exhibits predictive skill of up to 4 years when a sufficiently high vertical atmospheric resolution is used.
 Decadal climate prediction is a relatively new research field. After the first pioneering work [Smith et al., 2007; Keenlyside et al., 2008; Pohlmann et al., 2009], only recently a comprehensive set of decadal climate predictions with different systems was performed as part of the Coupled Model Intercomparison Project Phase 5 (CMIP5) [Taylor et al., 2009, 2012]. For CMIP5, retrospective forecasts, so-called hindcasts, were performed over the period between 1960 and 2010 to be assessed in the upcoming fifth assessment report of the Intergovernmental Panel on Climate Change [e.g., Goddard et al., 2013; Doblas-Reyes et al., 2013; Meehl et al., 2013]. The Max Planck Institute for Meteorology contributes to CMIP5 with hindcasts from its decadal prediction system, which is based on the climate model Max-Planck-Institute Earth System Model (MPI-ESM) [Giorgetta et al., 2013; Stevens et al., 2013; Jungclaus et al., 2013]. This system (named here baseline-0) makes use of an oceanic initialization from a forced ocean simulation. Within the MiKlip project, we develop a coupled oceanic and atmospheric initialization and additionally test the effect of increased model resolution. In this paper, we show the improvements achieved with the new system (named here baseline-1) and the effect of model resolution on predictive skill.
Müller et al.  analyze the predictive skill of the baseline-0 (b0) system. They show that the initialization of MPI-ESM improves forecast skill with respect to the uninitialized experiment predominantly over the North Atlantic for all lead times and over parts of Europe for multiyear seasonal means. However, negative skill scores over the tropical Pacific reflect a systematic error in the initialization. As a consequence, the overall skill, for example, in terms of global mean temperature, is lower than in other systems [Bellucci et al., 2012]. The reason for this problem is not fully understood. Flaws in the wind forcing [Lee et al., 2013] of the ocean model may cause an overly strong ocean response that forces the coupled model to adjust by inducing unrealistic heat fluxes.
 Building on our experience from testing three different ocean initializations [Kröger et al., 2012], we here initialize the ocean component with the newest oceanic reanalysis from the European Centre for Medium-Range Weather Forecasts (ECMWF) [Balmaseda et al., 2012]. Additionally, the positive experience of other decadal prediction groups with the initialization of the atmosphere [e.g., Smith et al., 2007] led us to also introduce an initialization of this component from ECMWF atmospheric reanalysis data [Uppala et al., 2005; Dee et al., 2011]. In the remainder of this paper, we give an overview of the two systems followed by an analysis of their differences in performance. Particular emphasis is placed on the tropical region, for which the skill in surface temperature prediction improves substantially. We particularly highlight predictive skill for the quasi-biennial oscillation (QBO) of equatorial stratospheric zonal winds [Baldwin et al., 2001].
2 Simulations and Methods
 The initialization method is briefly summarized here: Estimates of the oceanic temperature and salinity fields for the period 1948–2012 are produced by forcing the Max Planck Institute ocean model [Jungclaus et al., 2013] with daily fluxes of momentum, heat, and freshwater taken from the National Centers for Environmental Prediction/National Center for Atmospheric Research reanalysis [Kalnay et al., 1996]. The anomaly technique [Pierce et al., 2004; Smith et al., 2013] is used to initialize the decadal hindcasts with these fields. An ensemble simulation of three decadal hindcasts is started with MPI-ESM from consecutive days around 1 January each year from 1961 to 2012. The MPI-ESM in low resolution (LR, atmosphere: T63L47, ocean: 1.5°L40) is employed for this set of simulations.
 The baseline-1 (b1) system uses the same coupled model MPI-ESM as the baseline-0 (b0) system. However, the oceanic component is initialized with temperature and salinity anomalies from the ocean reanalysis system 4 (ORAS4) from ECMWF [Balmaseda et al., 2012]. Additionally, the atmospheric component is initialized with full-field 3-D temperature, vorticity, divergence, and surface pressure fields with the data from ECMWF Re-Analysis (ERA)-40 [Uppala et al., 2005] for the period 1960–1989 and ERA-Interim [Dee et al., 2011] for the period 1990–2012, respectively. An ensemble of 10 decadal hindcasts is started around 1 January in the same way as in b0 (lagged initialization) over the period 1961–2012 with the LR system. However, a higher oceanic resolution would potentially improve the climate predictions [e.g., Kirtman et al., 2012], and a higher vertical atmospheric resolution might resolve stratospheric processes more realistically [Marshall and Scaife, 2010; Charlton-Perez et al., 2013]. Therefore, the method is repeated with the mixed resolution (MR, atmosphere: T63L95, ocean: 0.4°L40) version of MPI-ESM but with a smaller ensemble size of only five-ensemble members, around 1 January in the same way as in b0 (lagged initialization) over the period 1961–2012.
 To base our analysis on the same ensemble sizes, only results from the ensemble means of the first three-ensemble members (the maximum number with yearly initialization in b0-LR) are shown. The ensemble mean generally outperforms individual ensemble members [Palmer et al., 2008]. We have convinced ourselves of the robustness of the results by comparison with other combinations wherever possible (b1-LR and b1-MR). The prediction skill is analyzed in the following section in terms of anomaly correlation [e.g., Wilks, 2011]. We also show root-mean-square error (RMSE) skill scores [e.g., Wilks, 2011] using the uninitialized model simulations as a reference [Goddard et al., 2013; Matei et al., 2012] in the supporting information of this paper. Significance is estimated using a block bootstrap method [e.g. Wilks, 2011] considering for autocorrelation as in Goddard et al. .
 The variability of the ensemble mean 2 m air temperature from the b0-LR and b1-LR hindcasts is verified against observations from Hadley Centre and Climate Research Unit (HadCRUT)3v [Brohan et al., 2006] for different prediction lead times in Figure 1. The anomaly correlation skill of the first prediction year (Figures 1a and 1c) is positive and significant almost everywhere, mainly reflecting that the observed warming trend over the period 1961–2012 is correctly represented in the first prediction year. This result is similar to findings in other studies [Kim et al., 2012; Hazeleger et al., 2013]. For hindcasts averaged over years 2–5, however, negative correlation skill appears in the b0-LR system in the tropics and eastern North Pacific with highest magnitudes in the tropical East Pacific (Figure 1b). A detailed analysis of the region with the negative predictive skill (not shown) reveals that the observed warming trend with relatively cool years in the 1960s and 1970s and relatively warm years in the 1990s and 2000s is reversed in the hindcasts over this prediction lead time. The problem is reduced in the b1-LR system: In the tropical Atlantic, Indian Ocean, and western Pacific, the correlation skill is positive almost everywhere (Figure 1d).
 For the first prediction year, b1-LR results in significant improvements over b0-LR in areas of the tropical and North Pacific, North Atlantic, and Southern Ocean (Figure 2a). For the hindcasts averaged over years 2–5, considerable improvement is achieved almost everywhere in the tropics (Figure 2b). A sensitivity study without assimilating the atmosphere (not shown) reveals that the skill improvement is mainly due to the different oceanic initializations. The step from b1-LR to b1-MR has only a small effect for the first prediction year (Figure 2c). However, for hindcasts averaged over years 2–5, an additional improvement is achieved in the tropical Pacific with the higher model resolution (Figure 2d). Very similar results are obtained for the RMSE skill scores (Figures S1 and S2 in the supporting information).
 The negative correlation skill over the tropical Pacific in b0-LR also affects the correlation skill of the global mean surface air temperature. The analysis of annual averages shows that the hindcasts with lead times of 2 and 3 years are problematic in b0-LR (Figure 3a). Even the 1–4 and 2–5 years lead time global mean temperature averages are clearly affected by this problem (Figure 3b). As a consequence, the skill of the b0-LR system is lower than in other decadal prediction systems [Bellucci et al., 2012]. Apart from this problem, for the global mean surface air temperature, the hindcasts of the different baseline systems are relatively similar to the uninitialized simulations for all lead times.
 In the following, we analyze the effect of the initialization of the atmosphere in the baseline-1 system as compared to the baseline-0 hindcasts with an uninitialized atmosphere. A candidate for multiannual predictive skill of the atmosphere is the quasi-biennial oscillation in the stratosphere. The winds in the equatorial stratosphere change direction with a period of roughly 28 months [Baldwin et al., 2001]. We here define a QBO index as the time series of monthly and zonal mean zonal wind anomalies at 20 hPa averaged between 10°S and 10°N. The relatively low vertical atmospheric resolution in MPI-ESM-LR does not allow for spontaneous QBO variability; however, the MPI-ESM-MR produces a spontaneous QBO due to its higher vertical atmospheric resolution [Schmidt et al., 2013; Krismer et al., 2013]. Given this situation, it is clear that the LR runs without the atmospheric initialization do not reveal any QBO variability (Figure 4a). With atmospheric initialization in the b1-LR model, the initial phase of the QBO is in alignment with observations, but thereafter the QBO amplitude decays on time scales of the order of months (Figures 4a and 4c), as the model at low vertical resolution cannot simulate wave mean flow interaction that is essential to simulate the QBO. This is similar to the time scale found in earlier studies [e.g., Hamilton and Yuan, 1992]. With the higher vertical atmospheric resolution in the b1-MR system, however, the QBO is simulated and due to the initialization remains in alignment with observations well beyond the first 12 months (Figure 4d). The region with significant predictive skill extends from about 15°S to 15°N and 10 to 70 hPa for the prediction lead time of 13–24 months (Figure 4e). The analysis of all lead times shows that only in the b1-MR system the atmospheric initialization does lead to predictive skill that remains significant for up to 4 years (Figure 4f). All other systems: b0-LR, b0-MR (not shown), and b1-LR lack either the ability to simulate the QBO or the initialization of the atmosphere and hence the QBO.
4 Discussion and Conclusions
 Although progress in terms of prediction quality is shown for the new MiKlip decadal prediction system baseline-1 compared to baseline-0, some smaller issues like the relatively low predictive skill in the tropical East Pacific are still present. Additionally, the absence of the pause in global warming (so-called “hiatus” period) after the year 2000 in historical runs in many climate models (including ours) [Meehl et al., 2011; Guemas et al., 2013; Kosaka and Xie, 2013] may also affect decadal forecasts. In both of our systems (b0 and b1), the hindcasts that are started during the hiatus period initially stay close to observations, but they drift after only a few years toward a too warm state. Caution is therefore required when actual forecasts are issued, as this drift may limit skill to much less than a decade. Since decadal climate prediction builds a bridge between climate projections and observations through initialization, studying their drift may help to understand and solve this problem.
 The improvements achieved from the transition from baseline-0 toward baseline-1 include the global mean surface air temperature. The problem of negative predictive skill over the tropical oceans, which is present for the 2–5 years prediction lead time in the baseline-0 system, is largely reduced in the baseline-1 system. This is mainly achieved by an improved oceanic initialization with data from the ORAS4 reanalysis. Additionally, the new atmospheric initialization in baseline-1 allows a skillful prediction of the QBO up to 48 months when the MR model is used. The predictability of the QBO bears a large potential impact on the variability of the circulation in the tropics but possibly also in the (stratospheric) extratropics [e.g., Baldwin et al., 2001].