Corresponding Author: J. Zhu, Center for Ocean-Land-Atmosphere Studies, Institute of Global Environment and Society, 4041 Powder Mill Rd., Ste. 302, Calverton, MD 20705, USA. (email@example.com)
 In this study, the impact of ocean initial conditions (OIC) on the prediction skill in the tropical Pacific Ocean is examined. Four sets of OIC are used to initialize the 12-month hindcasts of the tropical climate from 1979 to 2007, using the Climate Forecast System, version 2 (CFSv2), the current operational climate prediction model at the National Centers for Environmental Predictions (NCEP). These OICs are chosen from four ocean analyses produced by the NCEP and the European Center for Medium Range Weather Forecasts (ECMWF). For each hindcast starting from a given OIC, four ensemble members are generated with different atmosphere and land initial states. The predictive skill in the tropical Pacific Ocean is assessed based on the ensemble mean hindcasts from each individual as well as multiple oceanic analyses. To reduce the climate drift from various oceanic analyses, an anomaly initialization strategy is used for all hindcasts. The results indicate that there exists a substantial spread in the sea surface temperature (SST) prediction skill with different ocean analyses. Specifically, the ENSO prediction skill in terms of the anomaly correlation of Niño-3.4 index can differ by as much as 0.1–0.2 at lead times longer than 2 months. The ensemble mean of the predictions initialized from all four ocean analyses gives prediction skill equivalent to the best one derived from the individual ocean analysis. It is suggested that more accurate OIC can improve the ENSO prediction skill and an ensemble ocean initialization has the potential of enhancing the skill at the present stage.
If you can't find a tool you're looking for, please click the link at the top of the page to "Go to old article view". Alternatively, view our Knowledge Base articles for additional help. Your feedback is important to us, so please let us know if you have comments or ideas for improvement.
 The El Niño/Southern Oscillation (ENSO) in the tropical Pacific is the most important climate variation on seasonal-to-interannual time scales. Although ENSO originates and develops primarily in the tropical Pacific, it can have profound global effects. Over the past few decades, the ability of dynamical models to predict ENSO has improved significantly [e.g.,Cane et al., 1986; Ji et al., 1994; Chen et al., 1995; Kirtman et al., 1997; Latif et al., 1998; Schneider et al., 1999; Kirtman et al., 2002; Zhang et al., 2003; Jin et al., 2008; Wang et al., 2010; Zhu et al., 2012]. Operational seasonal ENSO prediction is now routinely done using coupled general circulation models (CGCMs) in many major climate centers worldwide. This substantial progress in operational ENSO prediction has greatly benefitted from the adoption of operational ocean data assimilation (ODA) [e.g., Rosati et al., 1997; Balmaseda et al., 2010a], which assimilates the oceanic surface and subsurface observations, providing ocean initial conditions (OIC) for the prediction. It is expected that more accurate estimation of the anomalous OIC, such as the thermocline depth, through ODA analysis is a key factor for reliable ENSO forecasts.
 Also over the past few decades, a number of global ODA systems [Xue et al., 2011a, and references therein] have been developed to synthesize various observations with the physics described by ocean general circulation models (OGCMs) to represent the time evolving, three-dimensional state of the ocean. Previous studies [Zhu et al., 2011; Xue et al., 2011a] have shown that ODA analyses have achieved generally consistent estimates of the upper ocean heat content in the tropical Pacific Ocean, although there are still considerable qualitative differences among different ODA analyses in the tropical Atlantic [see Figure 1 in Zhu et al., 2011]. On the other hand, quantitative differences are still substantial among these ODA analyses, even in the tropical Pacific Ocean, in their estimates of the total temperatures and the temperature anomalies in the subsurface ocean [Xue et al., 2011b]. To what extent these uncertainties can affect ENSO predictive skill is a question that needs further examination. Moreover, if this effect is non-negligible, mitigating this uncertainty is a significant issue for the current practice of ENSO prediction.
 In this study, we explore the influence of different ocean analyses on ENSO prediction by carrying out hindcast experiments with a CGCM. We also consider the possibility of more reliable ocean predictions based on multiple ocean analyses. The CGCM, the experimental design and datasets are described in the next section. The results are presented in Section 3. A summary and discussion is given in section 4.
2. Model, Hindcast Experiments and Datasets
 The coupled forecast model used in this study is the National Centers For Environmental Prediction (NCEP) Climate Forecast System, version 2 (CFSv2) (S. Saha et al., The NCEP Climate Forecast System Version 2, submitted to Journal of Climate, 2012), which became the operational forecast system for seasonal-to-interannual prediction in March 2011, replacing its predecessor, CFSv1. As a national climate model, CFSv1 has been particularly successful in seasonal-to-interannual climate forecasting, both retrospectively and operationally [e.g.,Saha et al., 2006]. A preliminary verification of the hindcasts by CFSv2 also demonstrates good ENSO predictive skill.
 In CFSv2, the ocean model is the Geophysical Fluid Dynamics Laboratory (GFDL) Modular Ocean Model (MOM) version 4 [Griffies et al., 2004], which is configured for the global ocean with a horizontal grid of 0.5° × 0.5° poleward of 30°S/30°N and meridional resolution increasing gradually to 0.25° between 10°S and 10°N. Vertically, it has 40 levels at constant depths (z-coordinate), with 27 levels in the upper 400 m. The maximum depth is approximately 4.5 km. The ocean model is coupled to a 3-layer global interactive dynamical sea-ice model with predicted fractional ice cover and thickness [Winton, 2000], and also includes a mean climatological river runoff specified at the model coastline [Griffies et al., 2004]. Other changes from CFSv1 to CFSv2 include replacing “virtual salt flux” with real freshwater flux and updated lateral mixing of both tracers and momentum [Gnanadesikan et al., 2006]. The atmospheric model is a lower resolution version of the Global Forecast System (GFS), which has been used for the operational numerical weather prediction and atmospheric reanalysis [e.g., Kalnay et al., 1996]. In CFSv2, the atmospheric component has a spectral horizontal resolution of T126 (105-km grid spacing), higher than that used in CFSv1, and 64 vertical levels in a hybrid sigma-pressure coordinate, also an upgrade from the sigma coordinate for CFSv1. The sub-grid scale physical parameterization package was also updated. The oceanic and atmospheric components exchange surface momentum, heat and freshwater fluxes, as well as SST, every 30 minutes. More details about CFSv2 can be found in Saha et al. (submitted manuscript, 2012).
 In this study, four sets of retrospective forecast experiments were conducted using CFSv2. The only difference among them was in the OIC, which are based on four different ocean analyses, two from NCEP and two from ECMWF. These analyses are the NCEP Climate Forecast System Reanalysis (CFSR) [Saha et al., 2010], the NCEP Global Ocean Data Assimilation System (GODAS) [Behringer, 2005], the ECMWF COMBINE-NV [Balmaseda et al., 2010b], and the ECMWF Ocean Reanalysis System 3 (ORA-S3) [Balmaseda et al., 2008]. The GODAS and CFSR (ORA-S3) ocean analyses have been used to initialize the operational seasonal predictions made by NCEP (ECMWF). Using these ocean analyses as OIC, the hindcast experiments start each April in 1979–2007, and last for 12 months. In comparison with the predictions with initial conditions from other seasons, hindcasts initialized in April generally have lower predictive skill, possibly due to the effect of the spring predictability barrier [Jin et al., 2008]. This initial time is chosen for our study because it is the crucial month to predict whether an El Niño event will occur in a particular year, and it represents a rigorous test of the forecast system's predictive skill.
 For all four sets of hindcasts, the atmosphere, land and sea ice initial states are specified in exactly the same way, using the instantaneous fields from the CFSR. For each set of experiments, four ensemble members are generated that differ in their atmosphere/land surface conditions, which are the instantaneous fields from 00Z of the first four days in April in the CFSR, respectively. For the OIC, to reduce the potentially negative effects of the mean biases in ocean analyses and the forecast model, and to make the predictions using OIC from different analyses directly comparable, we applied an anomaly ocean initialization strategy [e.g., Schneider et al., 1999] in these experiments. For this purpose, a monthly climatology for the CFSv2 ocean component was derived from the last 20 years of a 30-year simulation starting from the ocean-atmosphere CFSR state on November 1, 1980. The monthly anomalies of all variables from the ocean analyses are then calculated with respect to their own climatologies and superimposed on the CFSv2 monthly climatological states. The fields in March and April are averaged to represent the oceanic states at the start of April. Initializing the hindcasts using the monthly oceanic analyses is different from the operational practice of using an instantaneous analysis from the ODA system. We note that this choice may degrade the predictive skill in general because some of the oceanic wave characteristics are smoothed out. However, a set of test runs showed that using the monthly fields as OIC has little impact on the skill (see Figure S1 in theauxiliary material for details) and our approach seems to be feasible.
2.3. Hindcast Analysis Methods
 The ensemble means of the hindcasts are first created from the four different atmospheric-land initial states for each OIC, which will be referred to as the hindcast with respect to a given OIC (e.g., CFSR ensemble mean prediction). The next and also more practical question is, what is the benefit of using the ensemble mean of multiple hindcasts from different ocean analyses? Here, these individual OIC hindcasts are averaged to form the multiple ocean ensemble mean, referred to as the ES_MEAN hindcast. Furthermore, taking into account the differences in prediction skill with different OIC, the Super_Ensemble method [Krishnamurti et al., 1999, 2000] is applied to form an objective and optimal combination of hindcasts, the Super_Ensemble hindcast, with different weights assigned to different OIC predictions based on their performance at each location. Given the relatively small number of hindcast cases, we have used all hindcasts to build the Super_Ensemble hindcast retrospectively, by the multiple regression technique. The Super_Ensemble hindcast skill is used here as a diagnostic measure, which provides the upper limit of the predictive skill that the ensemble hindcasts can achieve. A cross validation about the Super_Ensemble hindcast shows a lower hindcast skill than both the Super_Ensemble diagnostics and ES_MEAN (see Figures S2–S4 in the auxiliary material for details). We suspect that this result is due to the relatively small sampling size of the hindcast cases used in building the Super_Ensemble model so that the statistical relation established from it is still not very stable.
 The ENSO prediction skill is examined by validating the predicted SST anomalies (SSTA) against the analyzed (observed) SST from ERSST v3 [Smith et al., 2008]. The predicted SSTA is derived by subtracting a lead time-dependent climatology from the total SST, but no additional time smoothing is applied. In this paper, predictions for April will be defined as 0-month lead, those for May will be defined as 1-month lead, and so on.
 As representative metrics of the overall hindcast skill, Figure 1shows the anomaly correlation and root-mean-square error (RMSE) between the observed and predicted Niño-3.4 SSTA time series as a function of lead month. The anomaly correlations for hindcasts made from the four ocean analyses (colored curves inFigure 1) show that there are substantial differences in prediction skill when CFSv2 is initialized from different ocean analyses, ranging about the order of 0.1 to 0.2 in the anomaly correlation at lead times longer than 2 months. In general, the hindcasts initialized from COMBINE-NV perform the best among the four sets, with anomaly correlation above 0.6 at all leads up to 12 months, which are better than the other hindcasts at almost all lead times. On the other hand, the predictive skill of CFSR seems lower than other runs after 4 months, with anomaly correlation lower than COMBINE-NV by 0.2 at 11-lead month. Hindcasts made from ORA-S3 and GODAS have comparable hindcast skill and typically fall between the other two. As a bulk measure, the average anomaly correlations (RMSE) during lead months 0–11 are 0.74 (0.63°C), 0.68 (0.68°C), 0.68 (0.68°C) and 0.64 (0.69°C) for the hindcasts with COMBINE-NV, ORA-S3, GODAS and CFSR, respectively.
 Given the substantial difference in ENSO prediction skill with different ocean analyses, it is surprising to see that the predictive skill of the ES_MEAN (solid grey curves in Figure 1) is close to the best hindcast by the COMBINE-NV initial state (red curves inFigure 1). The skill averaged over lead months 0–11 in the ES_MEAN can reach 0.73 in anomaly correlation and 0.61°C in RMSE. This skill level is generally comparable to COMBINE-NV's and clearly higher than other three. It is noted that the skill of the ensemble mean is superior to its members if they have comparable skills, and/or the members have some degree of independence (i.e., they have uncorrelated errors). However, it is interesting to see that the ensemble mean is closer to that of the more skillful one. As expected, the Super_Ensemble hindcast has higher prediction skill compared to the ES_MEAN hindcast, with the anomaly correlation averaged over lead months 0–11 increasing to 0.77 and RMSE decreasing to 0.52°C. It is encouraging to see that the ES_Mean hindcast skill is closer to the Super_Ensemble hindcast than most of the individual hindcasts. All these calculations suggest that an ensemble of predictions with different OIC can provide a more reliable prediction.
 The horizontal distributions of the SST prediction skill are presented in Figure 2for lead times of 2, 5, and 8 months. As expected, high correlation regions are generally located in the central equatorial Pacific. Among the six hindcasts, the Super_Ensemble performs the best, and ES_MEAN and COMBINE-NV are slightly worse but ahead of all other individual model hindcasts. This spatial distribution of the skill is pronounced even for short-lead hindcasts. For example, at the 2-month lead (Figure 2a), the correlations are greater than 0.8 over the eastern tropical Pacific in the COMBINE-NV and ES_MEAN hindcasts, with the Super_Ensemble hindcast covering a slightly larger area, but mostly below 0.8 in the GODAS, ORA-S3 and CFSR hindcasts. With the increase in lead time, the correlation drops first and faster in the eastern tropical Pacific. At the 5-month lead (Figure 2b), the correlation skill remains larger than 0.6 over a sizable region of the central Pacific for most hindcasts, while it falls to below 0.6 in the east. In the CFSR, the hindcast correlation skill drops below 0.6 almost everywhere at this lead-month. At the 8-month lead (Figure 2c), the correlations have increased modestly in the central equatorial Pacific, in comparison to those at 5-month lead (Figure 2b) for all cases. This rebound is likely seasonally dependent, reflecting the fact that the model is more skillful in predicting the mature ENSO state at its peak phase than in its developing stages (the 8-month lead time is for December), which is also seen inFigure 1a. It is encouraging that the correlation is greater than 0.7 over a relatively large region of the central equatorial Pacific near the dateline in Super_Ensemble, ES_MEAN, and COMBINE-NV.
 Beyond the statistical metrics, Figure 3shows the predicted and observed Niño-3.4 SST anomalies during the period 1979–2007. Generally, all hindcasts capture the onset and developments of major warm and cold events, and their transition. However, the amplitudes of the SST anomalies are clearly underestimated in some strong events like the 1982/83 and 1997/98 El Niño events; for example, only GODAS seems to properly capture the amplitude of the 1982/83 El Niño. It is also clear that some anomalous episodes were missed by all hindcasts. For instance, all hindcasts produced false alarms in 1980–81 and 2001–2002, while the warm episode in 1994–95 was missed by all hindcasts. Whether these common failures point to some systematic problems of the forecast model/initial conditions or issues of inherent predictability merits further analysis. If the large amplitude of the extreme events is too sensitive to random perturbations that are unpredictable, one would expect the ensemble mean to under-predict them. Maybe some individual ensemble members in a large enough ensemble can capture these events. Therefore, although the failure is clear, one may not simply attribute it to the inadequacy of the model or initial states while ruling out the potential effect of the inherent predictability limit. It seems that the model performs reasonably well in predicting near-neutral conditions in addition to the major events, such as the extended mild cold episode between 1996–1997 and the mild warm episode from 2004–2005. These skillful hindcasts may not be reflected in the anomaly correlation metric due to the fact that major events with large amplitudes have larger contributions to anomaly correlation.
 Since the same ocean model (MOM4) is used for creating the CFSR ocean analysis and in the CFSv2 model, a higher level of consistency is maintained in principle and is expected to benefit the hindcast performance when using the CFSR ocean analysis to initialize the CFSv2 model in comparison with using other ocean analyses. However, this assumption is not borne out, as shown in the above analyses, particularly for the hindcasts at lead times longer than 4 months. Examining the ocean initial state, we speculate that it is possibly due to some quantitative difference in the subsurface temperature anomalies between the CFSR and the other ODA states. Figure 4 shows the pairwise correlation map of heat content anomalies (HCA) among the four ocean analyses, with left panels (Figures 4a–4c) showing the correlations between CFSR and other three analyses and right panels (Figures 4d–4f) showing the pairwise correlations among the other three analyses. In comparison, the HCA changes among COMBINE-NV, ORA-S3 and GODAS (Figures 4d–4f) are closer to each other, with a relatively large region having correlation >0.95. On the other hand, the correlations indicate somewhat lower values between CFSR HCA and the other three analyses (Figures 4a–4c), with smaller regions having correlation >0.95, particularly in the equatorial region. Actually, CFSR forecast already became worse than other forecasts in the eastern equatorial Pacific at the 1 and 2-month lead. This is likely a local effect, which is consistent with the lack of correlation in the equatorial region between the CFSR HCA and the other reanalyses (Figure 4). It seems that CFSR has problems in representing the subsurface thermal structure in the equatorial Pacific, which manifests in low forecast skill at the early lead-time forecasts, and degrade its forecast skill at all lead times. Since the CFSR reanalysis was conducted in several parallel streams instead of in a continuous serial integration [Saha et al., 2010], there may be artificial discontinuities in the ocean heat content at the intersections of these segments [Xue et al., 2011b] and thus affect the predictive skill. In a recent study, Kumar et al. also found that there is a systematic difference in the CFSRR (CFS reanalysis and reforecast project (Saha et al. 2010, submitted manuscript, 2012)) hindcasts before and after 1999 because there is a clear shift in the CFSR ocean temperature analysis around that time, associated with the surface wind change due to the assimilation of the satellite-based ATOVS data after 1999. Based on this finding, they proposed that the CFSRR be analyzed with a different hindcast climatology over the period before/after 1999 in assessing the predictive skill. For consistency with other predictions, our post-processing did not take the factor into account. The anomaly initialization procedure used in this study may also exacerbate the potential issues described above. Therefore, we would like to emphasize that our sensitivity experiments with CFSR OIC reported here are not necessarily representative of the NCEP's CFSRR hindcasts. They are different at least in the following aspects: 1) initialized from monthly vs. instantaneous states; 2) anomaly vs. full initialization; 3) different reforecast period (1979–2007 vs. 1982–2009).
4. Conclusion and Discussion
 In this study, four ocean analyses are used to initialize a set of 12-month hindcast experiments (from 1979 to 2007) using the latest NCEP operational prediction model CFS version 2. These ocean analyses are all produced by the two leading prediction centers in the world, including GODAS and CFSR from NCEP and ORA-S3 and COMBINE-NV from ECMWF. For each OIC, four ensemble members are generated with slightly different atmosphere and land surface initial conditions, and the prediction skill for their ensemble average is assessed. The results indicate a non-negligible spread in the SST prediction skill among hindcasts made from different ocean analyses. Specifically, the ENSO predictions skill in terms of the anomaly correlation of the Niño-3.4 index can differ by as much as 0.1–0.2 at lead times longer than 2 months. An ES_MEAN hindcast, defined as the ensemble mean of all the hindcasts initialized with different ocean analyses, produces a more skillful SST prediction, with prediction skill equivalent to the best one, initialized with COMBINE-NV. Moreover, if different weights are applied on different predictions according to their performance, a higher skill can be achieved. This is the so-called the Super_Ensemble method, and its superior skill in hindcast mode has been demonstrated in previous works [Krishnamurti et al., 1999, 2000]. Among the four ocean analyses, CFSR seems to lead to lower skill in ENSO prediction at lead times longer than 4 months. Our preliminary analysis shows that this behavior may be related to the difference in the CFSR subsurface heat content in comparison with the other three analyses, notably in the equatorial region.
 Ensemble prediction based on hindcasts from perturbed initial states has been successfully applied in numerical weather prediction [e.g., Toth and Kalnay, 1997; Buizza and Palmer, 1998; Buizza et al., 2008]. For seasonal prediction, an ensemble of five perturbed ocean states is used in the ECMWF seasonal forecasting system, but all the ensemble members are created by applying perturbations in the ocean analysis, which uses the same ocean model and data assimilation. Vialard et al. showed that this ensemble generation is insufficient to generate reliable forecasts. However, as far as we know, there have been no attempts to use perturbed oceanic initial states from different ocean reanalyses in ENSO forecasts. Here, the ensemble has been created by using four different ocean analyses and adding atmospheric perturbations. This should sample uncertainties in the ocean initial conditions arising from the ocean model, ocean data assimilation, forcing fluxes and observational data. Our results show that, given the level of uncertainty of the current ocean analyses, ensemble initialization with multiple ocean states from different sources is a useful way to improve the predictive skill for ENSO forecasts, particularly for long-lead prediction. Compared with random perturbations, our approach can represent more realistically the uncertainty in the oceanic initial states. Similarly, this approach may benefit seasonal forecasts in other ocean domains. In particular, considering the fact that the subsurface conditions estimated by the current state-of-the-art ODA systems have the largest uncertainty in the tropical Atlantic compared to the other two tropical oceans [Zhu et al., 2011], the ensemble initialization approach with large initial spread based on multiple ocean analyses may be more promising when used to improve the prediction of Tropical Atlantic Variability (TAV).
 Funding for this study is provided by grants from NSF (ATM-0830068), NOAA (NA09OAR4310058), and NASA (NNX09AN50G). Zhang is supported by an NSF grant (AGS-1061998) and an NOAA grant (NA08OAR4310885). The authors would like to thank J. Shukla for his guidance and support of this project. We are grateful to D. Straus and T. Tozuka for their suggestions and comments. We thank ECMWF and NCEP for providing their ocean data assimilation analysis datasets, which made this project possible. The authors gratefully acknowledge NCEP for the CFS v2 model made available to COLA. We also acknowledge NCEP's assistance in porting the code to the computing platforms at the NASA Advanced Supercomputing (NAS) division. We particularly wish to thank Y. Hou, S. Moorthi and S. Saha for technical assistance and necessary data sets and W. Lapenta and L. Uccellini for enabling the collaborative activities. Computing resources provided by NAS are also gratefully acknowledged.
 The Editor thanks two anonymous reviewers for assisting with the evaluation of this paper.