• Open Access

Improved forecast skill in the tropics in the new MiKlip decadal climate predictions



[1] We introduce an improved initialization to the decadal predictions performed for the Mittelfristige Klimaprognosen (MiKlip) project based on the Max-Planck-Institute Earth System Model and furthermore test the effect of increased ocean and atmosphere model resolutions. The new initialization includes both a more sophisticated oceanic initialization and additionally an atmospheric initialization. We compare the performance of retrospective decadal forecasts over the past 50 years with that of the previous system. The new oceanic initialization considerably improves the performance in terms of surface air temperature over the tropical oceans on the 2–5 years time scale, which also helps to improve the predictive skill of global mean surface air temperature on this time scale. The higher model resolution improves the predictive skill of surface air temperature over the tropical Pacific even further. Through the newly introduced atmospheric initialization, the quasi-biennial oscillation exhibits predictive skill of up to 4 years when a sufficiently high vertical atmospheric resolution is used.

1 Introduction

[2] Decadal climate prediction is a relatively new research field. After the first pioneering work [Smith et al., 2007; Keenlyside et al., 2008; Pohlmann et al., 2009], only recently a comprehensive set of decadal climate predictions with different systems was performed as part of the Coupled Model Intercomparison Project Phase 5 (CMIP5) [Taylor et al., 2009, 2012]. For CMIP5, retrospective forecasts, so-called hindcasts, were performed over the period between 1960 and 2010 to be assessed in the upcoming fifth assessment report of the Intergovernmental Panel on Climate Change [e.g., Goddard et al., 2013; Doblas-Reyes et al., 2013; Meehl et al., 2013]. The Max Planck Institute for Meteorology contributes to CMIP5 with hindcasts from its decadal prediction system, which is based on the climate model Max-Planck-Institute Earth System Model (MPI-ESM) [Giorgetta et al., 2013; Stevens et al., 2013; Jungclaus et al., 2013]. This system (named here baseline-0) makes use of an oceanic initialization from a forced ocean simulation. Within the MiKlip project, we develop a coupled oceanic and atmospheric initialization and additionally test the effect of increased model resolution. In this paper, we show the improvements achieved with the new system (named here baseline-1) and the effect of model resolution on predictive skill.

[3] Müller et al. [2012] analyze the predictive skill of the baseline-0 (b0) system. They show that the initialization of MPI-ESM improves forecast skill with respect to the uninitialized experiment predominantly over the North Atlantic for all lead times and over parts of Europe for multiyear seasonal means. However, negative skill scores over the tropical Pacific reflect a systematic error in the initialization. As a consequence, the overall skill, for example, in terms of global mean temperature, is lower than in other systems [Bellucci et al., 2012]. The reason for this problem is not fully understood. Flaws in the wind forcing [Lee et al., 2013] of the ocean model may cause an overly strong ocean response that forces the coupled model to adjust by inducing unrealistic heat fluxes.

[4] Building on our experience from testing three different ocean initializations [Kröger et al., 2012], we here initialize the ocean component with the newest oceanic reanalysis from the European Centre for Medium-Range Weather Forecasts (ECMWF) [Balmaseda et al., 2012]. Additionally, the positive experience of other decadal prediction groups with the initialization of the atmosphere [e.g., Smith et al., 2007] led us to also introduce an initialization of this component from ECMWF atmospheric reanalysis data [Uppala et al., 2005; Dee et al., 2011]. In the remainder of this paper, we give an overview of the two systems followed by an analysis of their differences in performance. Particular emphasis is placed on the tropical region, for which the skill in surface temperature prediction improves substantially. We particularly highlight predictive skill for the quasi-biennial oscillation (QBO) of equatorial stratospheric zonal winds [Baldwin et al., 2001].

2 Simulations and Methods

[5] The initialization method is briefly summarized here: Estimates of the oceanic temperature and salinity fields for the period 1948–2012 are produced by forcing the Max Planck Institute ocean model [Jungclaus et al., 2013] with daily fluxes of momentum, heat, and freshwater taken from the National Centers for Environmental Prediction/National Center for Atmospheric Research reanalysis [Kalnay et al., 1996]. The anomaly technique [Pierce et al., 2004; Smith et al., 2013] is used to initialize the decadal hindcasts with these fields. An ensemble simulation of three decadal hindcasts is started with MPI-ESM from consecutive days around 1 January each year from 1961 to 2012. The MPI-ESM in low resolution (LR, atmosphere: T63L47, ocean: 1.5°L40) is employed for this set of simulations.

[6] The baseline-1 (b1) system uses the same coupled model MPI-ESM as the baseline-0 (b0) system. However, the oceanic component is initialized with temperature and salinity anomalies from the ocean reanalysis system 4 (ORAS4) from ECMWF [Balmaseda et al., 2012]. Additionally, the atmospheric component is initialized with full-field 3-D temperature, vorticity, divergence, and surface pressure fields with the data from ECMWF Re-Analysis (ERA)-40 [Uppala et al., 2005] for the period 1960–1989 and ERA-Interim [Dee et al., 2011] for the period 1990–2012, respectively. An ensemble of 10 decadal hindcasts is started around 1 January in the same way as in b0 (lagged initialization) over the period 1961–2012 with the LR system. However, a higher oceanic resolution would potentially improve the climate predictions [e.g., Kirtman et al., 2012], and a higher vertical atmospheric resolution might resolve stratospheric processes more realistically [Marshall and Scaife, 2010; Charlton-Perez et al., 2013]. Therefore, the method is repeated with the mixed resolution (MR, atmosphere: T63L95, ocean: 0.4°L40) version of MPI-ESM but with a smaller ensemble size of only five-ensemble members, around 1 January in the same way as in b0 (lagged initialization) over the period 1961–2012.

[7] To base our analysis on the same ensemble sizes, only results from the ensemble means of the first three-ensemble members (the maximum number with yearly initialization in b0-LR) are shown. The ensemble mean generally outperforms individual ensemble members [Palmer et al., 2008]. We have convinced ourselves of the robustness of the results by comparison with other combinations wherever possible (b1-LR and b1-MR). The prediction skill is analyzed in the following section in terms of anomaly correlation [e.g., Wilks, 2011]. We also show root-mean-square error (RMSE) skill scores [e.g., Wilks, 2011] using the uninitialized model simulations as a reference [Goddard et al., 2013; Matei et al., 2012] in the supporting information of this paper. Significance is estimated using a block bootstrap method [e.g. Wilks, 2011] considering for autocorrelation as in Goddard et al. [2013].

3 Results

[8] The variability of the ensemble mean 2 m air temperature from the b0-LR and b1-LR hindcasts is verified against observations from Hadley Centre and Climate Research Unit (HadCRUT)3v [Brohan et al., 2006] for different prediction lead times in Figure 1. The anomaly correlation skill of the first prediction year (Figures 1a and 1c) is positive and significant almost everywhere, mainly reflecting that the observed warming trend over the period 1961–2012 is correctly represented in the first prediction year. This result is similar to findings in other studies [Kim et al., 2012; Hazeleger et al., 2013]. For hindcasts averaged over years 2–5, however, negative correlation skill appears in the b0-LR system in the tropics and eastern North Pacific with highest magnitudes in the tropical East Pacific (Figure 1b). A detailed analysis of the region with the negative predictive skill (not shown) reveals that the observed warming trend with relatively cool years in the 1960s and 1970s and relatively warm years in the 1990s and 2000s is reversed in the hindcasts over this prediction lead time. The problem is reduced in the b1-LR system: In the tropical Atlantic, Indian Ocean, and western Pacific, the correlation skill is positive almost everywhere (Figure 1d).

Figure 1.

Maps of ensemble mean hindcast skill (anomaly correlation) of surface air temperature averaged over the (a and c) first prediction year and (b and d) years 2–5 for b0-LR in Figures 1a and 1b and b1-LR in Figures 1c and 1d against observation from HadCRUT3v over the period 1961–2012. Crosses denote skill exceeding the 5–95% confidence level.

[9] For the first prediction year, b1-LR results in significant improvements over b0-LR in areas of the tropical and North Pacific, North Atlantic, and Southern Ocean (Figure 2a). For the hindcasts averaged over years 2–5, considerable improvement is achieved almost everywhere in the tropics (Figure 2b). A sensitivity study without assimilating the atmosphere (not shown) reveals that the skill improvement is mainly due to the different oceanic initializations. The step from b1-LR to b1-MR has only a small effect for the first prediction year (Figure 2c). However, for hindcasts averaged over years 2–5, an additional improvement is achieved in the tropical Pacific with the higher model resolution (Figure 2d). Very similar results are obtained for the RMSE skill scores (Figures S1 and S2 in the supporting information).

Figure 2.

Differences of anomaly correlation skill. (a and b) b1-LR minus b0-LR. (c and d) b1-MR minus b1-LR. Year 1 in Figures 2a and 2c. Years 2–5 in Figures 2b and 2d. Crosses denote differences exceeding the 5–95% confidence level.

[10] The negative correlation skill over the tropical Pacific in b0-LR also affects the correlation skill of the global mean surface air temperature. The analysis of annual averages shows that the hindcasts with lead times of 2 and 3 years are problematic in b0-LR (Figure 3a). Even the 1–4 and 2–5 years lead time global mean temperature averages are clearly affected by this problem (Figure 3b). As a consequence, the skill of the b0-LR system is lower than in other decadal prediction systems [Bellucci et al., 2012]. Apart from this problem, for the global mean surface air temperature, the hindcasts of the different baseline systems are relatively similar to the uninitialized simulations for all lead times.

Figure 3.

Hindcast skill (anomaly correlation) of the global and ensemble mean surface air temperature as a function of lead time verified against observations from HadCRUT3v for b0-LR (blue), b1-LR (green), b1-MR (red), and uninitialized (black) for (a) annual averages and (b) 4-year averages. Dashed black and dotted black lines denote skill and difference of skill between b1-MR and b0-LR exceeding the 95% confidence level.

[11] In the following, we analyze the effect of the initialization of the atmosphere in the baseline-1 system as compared to the baseline-0 hindcasts with an uninitialized atmosphere. A candidate for multiannual predictive skill of the atmosphere is the quasi-biennial oscillation in the stratosphere. The winds in the equatorial stratosphere change direction with a period of roughly 28 months [Baldwin et al., 2001]. We here define a QBO index as the time series of monthly and zonal mean zonal wind anomalies at 20 hPa averaged between 10°S and 10°N. The relatively low vertical atmospheric resolution in MPI-ESM-LR does not allow for spontaneous QBO variability; however, the MPI-ESM-MR produces a spontaneous QBO due to its higher vertical atmospheric resolution [Schmidt et al., 2013; Krismer et al., 2013]. Given this situation, it is clear that the LR runs without the atmospheric initialization do not reveal any QBO variability (Figure 4a). With atmospheric initialization in the b1-LR model, the initial phase of the QBO is in alignment with observations, but thereafter the QBO amplitude decays on time scales of the order of months (Figures 4a and 4c), as the model at low vertical resolution cannot simulate wave mean flow interaction that is essential to simulate the QBO. This is similar to the time scale found in earlier studies [e.g., Hamilton and Yuan, 1992]. With the higher vertical atmospheric resolution in the b1-MR system, however, the QBO is simulated and due to the initialization remains in alignment with observations well beyond the first 12 months (Figure 4d). The region with significant predictive skill extends from about 15°S to 15°N and 10 to 70 hPa for the prediction lead time of 13–24 months (Figure 4e). The analysis of all lead times shows that only in the b1-MR system the atmospheric initialization does lead to predictive skill that remains significant for up to 4 years (Figure 4f). All other systems: b0-LR, b0-MR (not shown), and b1-LR lack either the ability to simulate the QBO or the initialization of the atmosphere and hence the QBO.

Figure 4.

(a and b) Time series of monthly and zonal mean zonal wind anomalies at 20 hPa averaged between 10°S and 10°N (QBO) from ERA-40 and ERA-Interim reanalyses (black) together with ensemble means of the first 12 months of the hindcasts (combined into one time series) for b0-LR (blue) and b1-LR (green) in Figure 4a and b1-MR (red) together with the uninitialized (MR) simulation (grey shaded) in Figure 4b. (c and d) As in Figures 4a and 4b but for the hindcast months 13–24. (e) Hindcasts skill (anomaly correlation) of ensemble mean QBO of b1-MR for hindcast months 13–24 as a function of latitude and height. The shaded area exceeds the 95% confidence level. (f) Hindcast skill (anomaly correlation) of ensemble mean QBO at 20 hPa as a function of lead time for b0-LR (blue), b1-LR (green), b1-MR (red), and uninitialized (black). Dashed black and dotted black lines denote skill (being different to zero) and difference of skill between b1-MR and b1-LR exceeding the 95% confidence level. All data are smoothed with a 1-year running mean.

4 Discussion and Conclusions

[12] Although progress in terms of prediction quality is shown for the new MiKlip decadal prediction system baseline-1 compared to baseline-0, some smaller issues like the relatively low predictive skill in the tropical East Pacific are still present. Additionally, the absence of the pause in global warming (so-called “hiatus” period) after the year 2000 in historical runs in many climate models (including ours) [Meehl et al., 2011; Guemas et al., 2013; Kosaka and Xie, 2013] may also affect decadal forecasts. In both of our systems (b0 and b1), the hindcasts that are started during the hiatus period initially stay close to observations, but they drift after only a few years toward a too warm state. Caution is therefore required when actual forecasts are issued, as this drift may limit skill to much less than a decade. Since decadal climate prediction builds a bridge between climate projections and observations through initialization, studying their drift may help to understand and solve this problem.

[13] The improvements achieved from the transition from baseline-0 toward baseline-1 include the global mean surface air temperature. The problem of negative predictive skill over the tropical oceans, which is present for the 2–5 years prediction lead time in the baseline-0 system, is largely reduced in the baseline-1 system. This is mainly achieved by an improved oceanic initialization with data from the ORAS4 reanalysis. Additionally, the new atmospheric initialization in baseline-1 allows a skillful prediction of the QBO up to 48 months when the MR model is used. The predictability of the QBO bears a large potential impact on the variability of the circulation in the tropics but possibly also in the (stratospheric) extratropics [e.g., Baldwin et al., 2001].


[14] The research leading to these results has received funding from the German Federal Ministry for Education and Research (BMBF) projects, MiKlip (http://www.fona-miklip.de/en/index.php) and RACE, (http://race.zmaw.de) and the European Union's Seventh Framework Programme (FP7/2007-2013) under grant agreement ENV.2012.6.1-1: Seasonal-to-decadal climate predictions toward climate services (http://www.specs-fp7.eu/) and ENV.2008. Comprehensive Modelling of the Earth System for Better Climate Prediction and Projection (http://www.combine-project.eu/). We have used observational data from Hadley Centre (http://www.metoffice.gov.uk/hadobs/hadcrut3/) and reanalyses from ECMWF (http://www.ecmwf.int) and NCEP/NCAR (http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanalysis.html). The climate simulations were performed at the German Climate Computing Centre (DKRZ). We thank Elisa Manzini, Marco Giorgetta, and two anonymous reviewers whose comments helped to improve this paper.

[15] The Editor thanks two anonymous reviewers for their assistance evaluating this manuscript.