Multicycle ensemble forecasting of sea surface temperature


  • Gary B. Brassington

    Corresponding author
    1. Centre for Australian Weather and Climate Research, Bureau of Meteorology, Sydney, New South Whales, Australia
    • Corresponding author: G. B. Brassington, Centre for Australian Weather and Climate Research, Bureau of Meteorology, PO Box 413, Darlinghurst, Sydney, NSW 1300, Australia. (

    Search for more papers by this author


[1] A novel extension to time-lagged ensemble forecasting called multicycle ensemble forecasting improves the independent sampling of forecast model errors. Multicycle is defined such that each forecast cycle is independent of the previous forecast cycle. For an M cycle system the background field for each cycle is from a model hindcast M cycles earlier. The model errors have a factor M longer period to grow compared with a sequential system; however, the increased independence in the forecast model errors provide weighted ensemble averages with greater skill and reliability over the 0 lag forecast and a good spread-error relationship. This cost-efficient technique is relevant to global ocean forecasting where an ensemble method is computationally prohibitive.

1 Introduction

[2] Ocean circulation outside of the tropics is baroclinically unstable giving rise to large populations of mesoscale eddies as observed by satellite altimetry [Chelton et al., 2011]. Globally, 35,000 eddies over a 16 year record were identified and tracked with spatial scales >100 km and persistence > 16 weeks. Mesoscale variability extends beyond the observable spectrum of altimetry [Le Traon et al., 2001] with a continuum of scales. For example, eddies observed in the Tasman Sea by drifting buoys have phase speeds of ~0.45 ms−1 where the U/c approaches unity, [Brassington, 2010; Brassington et al., 2010]. This portion of the spectrum in an ocean model is weakly constrained by assimilation of the observing system.

[3] Modeling this nonlinear dynamical system can lead to rapid and nonuniform error growth from random errors in the initial and boundary conditions. A single deterministic forecast is therefore one member of a population of possible forecasts. When error growth rates are large, or the forecast period long, a single deterministic forecast is unreliable and a statistical description of the population becomes essential. Ensemble forecasting has become routine in both numerical weather prediction (NWP) [e.g., Molteni et al., 1996; Toth et al., 1997] and seasonal forecasting [e.g., Vialard et al., 2005; Hudson et al., 2013] for this purpose. However, at present ensemble forecasting remains computationally prohibitive for operational global ocean forecasting.

[4] Several low-cost ensemble generation methods have been used in NWP and seasonal forecasting such as time-lagged ensembles [e.g., Hoffman and Kalnay, 1983], super ensembles [e.g., Krishnamurti et al., 1999], and bred vectors [e.g., Toth and Kalnay, 1997; O'Kane et al., 2011]. Time-lagged ensembles are directly applicable to a single sequential operational forecast system. However, Brankovic et al. [1990] determined that systematic errors reduced the gain from ensemble averaging, i.e., the time-lagged ensemble was ineffective at independently sampling the random errors. Lu et al. [2007] later demonstrated greater benefit from time-lagged ensembles for short-range prediction if a weighted ensemble average is introduced.

[5] A novel extension to time-lagged ensemble forecasting referred to as multicycle ensemble is described in section 2. The results of forecasting sea surface temperature (SST) using a multicycle ensemble are presented in section 3 including the skill scores and spread-error relationship. SST is the most constrained variable by the observing system and chosen to demonstrate the advantages of this strategy for forecasting and briefly discussed in section 4.

2 Multicycle Forecast System

[6] A multicycle (or M cycle) forecast system uses M cycles with only one cycle performed per forecast. The M cycles are performed sequentially and then repeated every M forecast cycles. Each of the M cycles is independent in the sense that the background field for the data assimilation is obtained from a model hindcast M cycles earlier, and no error information is used from the other M-1 cycles. It is assumed that the configuration of the cycles is identical.

[7] The multicycle ensemble method will be demonstrated using the Ocean Model, Analysis and Prediction System version 2 (OceanMAPSv2), the Australian Bureau of Meteorology's (ABOM) operational ocean forecast system [Brassington et al., 2012]. OceanMAPSv2 is a four-cycle system with one cycle performed each day and repeated every fourth day (see Figure S1 in the supporting information). OceanMAPSv2 is comprised of the Geophysical Fluid Dynamics Laboratory Modular Ocean Model version 4.1 [Griffies, 2010] configured as a global model with 0.1° × 0.1° resolution in the Australian region (90°E to 180°E, 75°S to 16°N) [Schiller et al., 2008]. An ensemble optimal interpolation scheme is used referred to as the BLUElink Ocean Data Assimilation System (BODAS) [Oke et al., 2008, 2012]. BODAS assimilates real-time observations of sea surface height anomaly from altimetry (e.g., Jason-1, Jason-2, and Cryosat-2), sea surface temperature (e.g., Windsat, NOAA-18, NOAA-19, and METOP-A), and in situ profiles (e.g., Argo, XBT, and Moorings). The surface forcing is obtained from the ABOM's operational system, ACCESS-G/APS1 [Puri et al., 2013]. The components of OceanMAPSv2 has similarities to other international systems [Dombrowsky et al., 2009; Hurlburt et al., 2009] however the multicycle design is unique.

[8] Each ocean forecast cycle consists of a behind real-time (BRT) analysis 9 days behind real time using a symmetric data window for altimetry, satellite SST, and profiles of ± (5, 1, 3) days, respectively followed by a 4 day hindcast. The increment fields are introduced by an adaptive initialization method [Sandery et al., 2011] over the 24 h period of the analysis. The BRT hindcast is followed by a near-real-time analysis 5 days behind real time using an asymmetric data window of −7/+3 days for altimetry followed by a 5 day hindcast to real time and then a 7 day forecast (see Figure S1). This system was tested for a set of daily hind cycles, 5 January to 31 August 2012 based on archived real-time observations and 3-hourly hindcast/forecast surface fluxes from the operational ACCESS-G/APS1 system.

3 Multicycle Ensemble Results

[9] A multicycle system is time lagged resulting in different expected accumulated error for a given forecast day [Brassington et al., 2012]. A weighted ensemble average is applied to provide an optimal linear combination as has been previously argued in numerical weather prediction [Lu et al., 2007]. A weighted least squares analysis of the four-cycle time-lagged ensemble of SST obtains the squared sum,

display math(1)

where oi is the observed SST anomaly, math formula is the weighted ensemble mean SST anomaly with the weights constrained to sum to unity, H is a linear interpolation operator from model space to observation space, T'k is the modeled SST anomaly of the kth cycle, b is the bias, M is the number of cycles, N is the number of observations. SST anomalies are defined relative to a seasonal climatology [Ridgway et al., 2002]. Differentiation with respect to the bias and the independent ensemble weights provides a set of M + 1 equations that is invertible. With some simple algebra the coefficients can be shown to be composed of the observed and modeled mean temperature as well as their variance and covariances. It is straightforward to calculate the weights iteratively for multiple hind cycles and larger sample size N.

[10] Applying equation (1) to a four-cycle (i.e., M = 4) time-lagged ensemble using values from the Australian region (50°S-Eq. 90°E to 180°E), the weights for each of the four members is shown in Figure 1 corresponding to the hindcast/forecast hours −120 to 72. The weights are derived using all the daily hind cycles for the period 1 March to 31 August 2012. The global bias (not shown) monotonically increases in magnitude from −0.08 to −0.14°C with increasing hindcast period.

Figure 1.

The weights for a constrained least square four-cycle ensemble average, equation (1), applied to sea surface temperature for the hindcast/forecast hours −120 to 072. The weights for the 0, 24, 48, and 72 h lagged cycles are shown as a circle, square, diamond, and triangle, respectively. The weights are based on a set of daily hindcasts, 1 March to 31 August 2012.

[11] The maximum weighting for the 0 h lagged hindcast (ω1), corresponding to −96 h, accounts for ~58% of the ensemble average. At −120 h ω1 is reduced to ~45% of the ensemble average largely compensated by increases in weights for the 24 h and 48 h lag hindcasts. This behavior is attributed to the 24 h initialization of the BODAS increments. Subsequent to −96 h, the weighting ω1 declines monotonically but appears to have an asymptote above or equal to the theoretical equal weighting of 0.25. The weight for the 24 h lag hindcast is approximately constant, while the weights for the 48 h and 72 h lag hindcasts both increase monotonically toward an asymptote. At the 72 h forecast the distribution of weights is (ω1, ω2, ω3, ω4) = (0.33, 0.26, 0.22, 0.19). As the hindcast period increases the weighted ensemble mean converges toward a simple average of the four-cycle members or as a homogeneous four-member ensemble forecast system.

[12] Figure 2 provides an example of the 24 h weighted ensemble mean SST anomaly for the 000 h forecast corresponding to the 1 June 2012 and for two regions (Figures 2a–2f) the Tasman Sea (45°S to 25°S, 147°E to 167°E) and (Figures 2g–2l) the southeast Indian Ocean (40°S to 20°S, 99°E to 119°E). The four cycles are shown in Figures 2a–2d and 2g–2j, respectively. The corresponding super observations from the BRT analysis in Figures 2e and 2k are shown as circles colored according to their temperature anomaly. The weighted ensemble averages applying the weights (ω1, ω2, ω3, ω4) = (0.37, 0.27, 0.20, 0.16) are shown in Figures 2f and 2l. The top left (right) corner of each forecast and weighted ensemble mean include values for the root mean square error (RMSE), anomaly cross-correlation (aCC) and skill score (SS) as defined in equation (2). Each of the 24 h forecasts show gradients up to the grid scales (0.1°), while the average as expected is smoother. However, the ensemble mean shows a 7–9% reduction in RMSE and 14–16% increase for aCC and SS over the 0 h lagged cycle.

Figure 2.

Modeled and observed sea surface temperature anomalies from the 000 h OceanMAPSv2.1 forecast for the 1 June 2012 and for (a–f) the Tasman Sea (45°S to 25°S, 147°E to 167°E) and (g–l) the southeast Indian Ocean (40°S–20°S, 99°E to 119°E). The SST anomalies are relative to the Commonwealth Scientific and Industrial Research Organisation Atlas for Regional Seas [Ridgway et al., 2002]. Figures 2a–2d and 2g–2j correspond to the 0, 24, 48, and 72 h lagged forecasts. Figures 2e and 2k represent the BRT super observations each indicated by a colored circle of the temperature anomaly. Figures 2f and 2l represent the weighted ensemble mean. The top left (right) corner of each figure shows the values for RMSE, aCC, and SS, respectively.

[13] The skill score of the four-cycle weighted ensemble mean (SSens) and the 0 h lag hindcast (SS1) is shown in Figure 3 based on the skill score defined as

display math(2)

where math formula is the forecast standard deviation (σf) normalized by the observed standard deviation (σr), aCC is the anomaly cross correlation, aCC0 = 1 is the limiting correlation following Taylor [2001]. Figure 3 shows the median skill score and the 95th percentile range of skill scores using all the daily hind cycles for the period 1 March to 31 August 2012. The median skill score for SSens is greater than SS1 for all forecast hours and declines in skill at a reduced rate. The lowest 95th percentile SSens is greater than the highest 95th percentile for SS1 at the 48 h forecast onwards. Importantly, the range of skill scores for the ensemble average is 55% to 66% less than the 0 h lag hindcast, indicating greater reliability.

Figure 3.

The median (black line) and 95th percentile range (colored bars) of skill score equation (2) for sea surface temperature seasonal anomaly from OceanMAPSv2.1 daily hindcast, 1 March to 31 August 2012. The red bars represent the 0 h lagged cycle hindcast corresponding to each day of the hindcast period and forecast hours −120 to 72. The blue bars represent the four-cycle weighted mean for the same period and forecast hours.

[14] A spread-error relationship equates the expected ensemble variance and expected squared error of ensemble mean for small ensemble sizes is outlined in Wilks [2006]. Adjusting this formula for a weighted ensemble average and variance gives

display math(3)

where E is the expectation and other terms defined as per equation (1). Applying equation (3) to the four-cycle time-lagged ensemble, the spread-error relationship is shown in Figure 4. The solid circles (hindcast) or squares (forecast) represent the median value from a set of daily hind cycles, 1 March to 31 August 2012 for the hindcast/forecast hours −120 to 48. The 95th percentile range of values is shown by the corresponding cross hairs. The −120 and −96 h show that the ensemble variance is half the expected squared error. During the subsequent hindcast period the median values converge to the line of direct proportionality. The rate of growth of both quantities decreases monotonically, while the 95th percentile range for both the expected ensemble variance and expected squared error increases with increasing hindcast period.

Figure 4.

The weighted expected squared ensemble mean error of sea surface temperature against the weighted expect ensemble variance for a four-cycle ensemble hindcast as per equation (3). The solid circles (hindcast) or squares (forecast) represent the median value from a set of daily hindcasts, 1 March to 31 August 2012 for the hindcast/forecast hours −120 to 48. The 95th percentile range is shown by the corresponding cross hairs.

4 Conclusion

[15] In this paper we present a novel extension to time-lagged ensembles by introducing a multicycle system. This strategy retains the same practical advantages of the time-lagged ensemble of cost efficiency as it is applied to a single daily forecast cycle. However, by construction the M cycles improve the sampling of independent random model errors reducing the shortcomings of traditional time-lagged forecasting.

[16] The Bureau of Meteorology's ocean forecast system, OceanMAPSv2, is setup as a four-cycle system. The four cycles use an identical configuration such that the systematic errors are unchanged. The global systematic error (bias) of the weighted 4-cycle ensemble average monotonically increases over the forecast period from −0.08 to −0.14°C. The least square weights demonstrate that there is an increasing contribution from the three time-lagged cycles with increasing hindcast period. The weighted ensemble average of SST anomalies provides improved skill over the latest cycle in all aspects: the median value of the skill, the rate of decrease in skill, and the 95th percentile range of skill. The skill of the four-cycle ensemble for the 72 h forecast is equivalent to the skill of the −24 h hindcast from the 0 h lagged cycle indicating that the disadvantage of a 4 day analysis cycle has been recovered. It is noted that both weighted and uniform ensemble averages have more skill than the latest forecast but the former is optimal (see Figure S2).

[17] The moving time window of observations is the prime source of perturbations. A periodogram (see Figures S3 and S4) illustrates that the scales < ~0.6° contain random information that is independently sampled by the four-cycle system. The scales ~0.6° < λ < 4° transition from least constrained to maximally constrained by the observing system. We note that the use of common hindcast fluxes and a common model excludes other sources of error such that the total uncertainty of SST will be underestimated.

[18] For sea surface temperature anomalies in the Tasman Sea region the weighted ensemble average for the 000 h forecast robustly improves the RMSE by 7–9% and 14–16% for anomaly cross correlation and skill score over the 0 h lagged cycle. In addition, the spread-error relationship approaches direct proportionality with increasing hindcast period. The results presented for sea surface temperature from Australia's multicycle operational ocean forecasting system indicate that a multicycle system is a cost-efficient strategy for forecasting sea surface temperature.


[19] The author thanks two anonymous reviewers for their valuable comments on this paper. The science and technical team for the BLUElink project, the Bureau of Meteorology, the Commonwealth Scientific and Industrial Research Organisation and the Royal Australian Navy are also acknowledged.

[20] The Editor thanks two anonymous reviewers for their assistance in evaluating this paper.