An innovative multi-model fusion technique is proposed to improve short-term ocean temperature forecasts: the three-dimensional super-ensemble. In this method, a Kalman Filter is used to adjust three-dimensional model weights over a past learning period, allowing to give more importance to recent observations, and take into account spatially varying model skills. The predictive performance is evaluated against SST analyses, CTD casts and gliders tracks collected during the Ligurian Sea Cal/Val 2008 experiment. Statistical results not only show a very significant bias reduction of this multi-model forecast in comparison with the individual models, their ensemble mean and a single-weight-per-model version of the super-ensemble, but also the improvement of other pattern-related skills. In a 48-h forecast experiment, and with respect to the ensemble mean, surface and subsurface root-mean-square differences with observations are reduced by 57% and 35% respectively, making this new technique a suitable non-intrusive post-processing method for multi-model operational forecasting systems.
 Operational ocean forecasting systems aim to provide end-users - e.g. manager of a fishery or an offshore platform, commander of a vessel, coordinator of search-and-rescue activity, etc. - with the most accurate description of the sea-state at a given time instant in the near-future, typically up to 72 h. In the last years, an increasing number of operational forecasting systems have become available - e.g. MERCATOR, MFS, NCOM - each of them potentially providing a different prediction. Based on the assumption that these models may have complementary forecasting skills, multi-model fusion methods have emerged, aiming at providing a single forecast with overall improved performance. As shown by Galmarini et al. , such a reconciliation of different model predictions into a unique scenario is possible and may help decision makers. Among these combination techniques, the super-ensemble (SE) method, which assumes persistent skills of the models over short forecast periods, was introduced in meteorology by Krishnamurti et al. . The method is based on multiple linear regressions at observation points over a training period. Lately, the SE was also successfully applied in oceanography for the prediction of temperature [Rixen et al., 2009], acoustic properties [Rixen and Ferreira-Coelho, 2006] and significant wave height [Lenartz et al., 2010]. Besides SE techniques, Logutov and Robinson  proposed an adaptive Bayesian model fusion for ocean forecast, whereas Rixen and Ferreira-Coelho  and Vandenbulcke et al.  introduced and developed the hyper-ensemble concept, which combines models of different nature to produce a forecast, in these cases of the surface drift. Ensemble runs, which involve either simulations of different models or perturbed simulations of the same model and which are often used for quantifying the uncertainties in predictions [Lermusiaux et al., 2006], can also be seen as a particular fusion technique, yet with all members equally weighted.
 In this paper, SE techniques are improved by introducing the three-dimensional spatial variability of model weighting in an oceanographic context, characterized by sparsely distributed observations. The new method consists in computing a single optimization over the whole modeling domain, thanks to the specification of weight error spatial covariances. It is evaluated for temperature forecast in a coastal region, using three ocean circulation models and the LSCV08 field experiment data set. Presentation of models and observations is done in section 2.1, whereas the method and experimental set-up are described in section 2.2. Results are shown in section 3 and conclusions are drawn in section 4.
2. Data and Method
2.1. Data: Models and Observations
 The Ligurian Sea Cal/Val Cruise 2008 (LSCV08), conducted during September and October 2008 offshore La Spezia (Italy), is used as an opportunity data set to evaluate the three-dimensional super-ensemble (3DSE) technique. Two types of instruments were used for the profiling of the ocean subsurface: a Sea-Bird CTD and two Slocum gliders, deployed respectively by teams of NURC (NATO Undersea Research Centre) and Rutgers University. The CTD casts, mainly located in the littoral zone, provide temperature and salinity data between the surface and 100 m. The gliders tracks are set in the same vertical range but in somewhat more offshore waters (Figure 1).
 Surface data are provided by remote sensing products coming from the OSTIA (Operational Sea Surface Temperature and Sea Ice Analysis) project. They combine satellite data provided by the GHRSST (Group for High-Resolution Sea Surface Temperature) and in situ observations, in order to produce daily analyses of the current SST for the global ocean [Stark et al., 2007] at a resolution of 1/20°. The accuracy is around 0.8°C in our area of interest.
 Three operational ocean forecasting systems are used in this study: MERCATOR, MFS and RELO-NCOM. Their main characteristics are summarized in Table 1. More details can be found in Text S1 of the auxiliary material. It is also worth mentioning that none of these systems assimilates the subsurface data set used in our experiments.
Δx, horizontal resolution; Δz, vertical resolution; time step between two successive outputs; time step between two successive forecast releases.
1 – 450 m
3 – 300 m
1 – 25 m
2.2. Method: 3DSE
 The 3DSE method consists in a three-dimensional dynamical adaptive weighting of model outputs during a specified and recent learning period. A Kalman Filter (KF) is used to minimize the distance between the weighted linear combination of the model outputs and the observations during that period. Assuming that the combination remains optimal at short-term, model weights are frozen and a multi-model prediction is produced for the forecast period. As the literature about data assimilation (DA) extensively covers the KF topic, we confine ourselves to the interpretation of its equations within this particular framework.
 The main difference with usual oceanographic implementations of the KF lies in the workspace. Instead of containing ocean variables at each point of the model grid, the state vector x contains the weights for each model, each variable and at each point of a common 3DSE grid, on which the different models are first interpolated. The length of the state vector is thus NT = NM × NV × NP, where NM is the number of models, NV of variables and NP of grid points. In this study the focus is on temperature, so that NV = 1. The method might be implemented in multivariate situations, but this aspect still requires dedicated investigations. In particular, the constraint of the increased number of degrees of freedom is expected to require a larger data set and increased computational resources. The observation operator H, which projects the space of model weights onto the space of observations, also differs from the common use. In the 3DSE, H contains the values of individual model variables (here only temperature) at observation points. As a result, and in accordance with the KF theory, Hx is the weighted linear combination produced by the 3DSE and interpolated at the observation points.
 At the analysis step, i.e. at the end of a time interval Δt (here 6 h) during which new data become available, the Kalman gain matrix K is computed and the state vector and associated error covariance matrix are updated according to the following equations:
where P is the model weights error covariance matrix, R the observation error covariance matrix and yo the vector of observations. Superscripts f and a respectively denote KF forecast or analysis values, whereas T indicates the matrix transpose. By lack of additional information about the observational error, its covariance matrix is diagonal.
 During the prediction step, the state vector xf is left unchanged in accordance with the hypothesis of stationarity of model weights at short term. The model for the evolution of the weights between two analysis steps M is thus the identity matrix I. In order to take into account the error in M, Pa is increased by the model error covariance matrix Q, which has the same shape as Pa but multiplied by a lead-time-dependent coefficient α. α is such that the trace of Q equals the initial trace of P0 for a prediction time interval of duration τ (here 3 days). This definition of Q prevents from altering the weight error correlations generated by the filter at former analysis steps:
 This prediction/analysis cycle is repeated until the end of the learning period. The 3DSE is initialized as follows. Each weight is set to 1/NM, so that the first forecast corresponds to the model Ensemble Mean (EM). The initial block-diagonal error covariance matrix P0 is made of model blocks. Each of them is the product of a spatial correlation matrix by a variance σP2. To build this correlation matrix, a decreasing exponential function, parameterized with prescribed horizontal and vertical correlation lengths is used. Here, σP2 = 0.1, h = 1° and v = 20 m. No inter-model weight error covariance is considered at initialization. Nevertheless, these covariances develop throughout the assimilation process. The observation error variance is σOST2 = 1°C2 for OSTIA images and σCTD/GLI2 = 0.5°C2 for CTD and glider profiles.
 We apply the method over the regular grid extending horizontally from 9 to 10.3°E and from 43.5 to 44.4°N with a resolution of 0.1°, and vertically from 0 to 100 m with a resolution of 10 m. Eight base times were selected accordingly to data and model availability in order to validate our results against independent data and provide different test cases: 25 and 26 September, and 6, 17, 20, 23, 25 and 26 October 2008. Since the models have different lead times, the length of the forecast period is constrained by the shortest model lead time. Hence, the experiments are designed in such a way that the most recent available model outputs are used for the training phase, while a free-run 48-h prediction is used for the forecast. This assures temporal continuity in the model outputs, and consequently, in the SE outputs. Though not strictly necessary, this property is highly desirable. For each of these base times, Forecasts minus Observations statistics that consist of the RMSD (root-mean-square difference) - encompassing both the BIAS and the unbiased RMSD (URMSD) - and the linear correlation coefficient (CORR), are computed over 12-h bins. Statistical results are presented as averages over these eight test cases. A simple version of the super-ensemble allowing a single weight per model for the whole domain (0DSE) is also implemented to assess the contribution of the three-dimensional aspect of the method.
 First, in order to provide a qualitative evaluation of the results, Figure 2 presents the difference Forecasts minus Observations in the water column along the track profiled by the RU01 glider between 23 and 27 September 2008. Whereas NCOM shows a strong under- (respectively over-) estimation of the temperature in (resp. below) the mixed layer, MFS and MERCATOR typically show absolute differences lower than 2°C with respect to the glider measurements. The DA during the learning period leads to a good match between the 3DSE multi-model and observations before the base time. The benefit remains substantial during the whole forecast period. The temperature in both the surface mixed layer (0 to 30-m depth) and the layer below 50 m is better estimated by the new method than by the individual models, EM or 0DSE. The thermocline is shown to be a much more sensitive area due to larger vertical gradients and increased mismatch between individual models and observations. 3DSE forecast differences with observations reach 1 to 2°C in the thermocline. Notice also the better performance of the 3DSE compared to the 0DSE during the forecast period, highlighting the benefit of space-depending weights.
 Since the 3DSE method is based on the KF, the RMSD is the specific error measurement that is expected to be reduced. The surface 3DSE RMSD (Figure 3, top) is significantly reduced during the learning period compared to individual models and EM (reduction around 60% compared to the EM with a value down to 0.2°C) as a direct consequence of the DA. The 3-D variability of model weights allows a better adjustment to the observations, as demonstrated by the comparison of 0DSE and 3DSE results during the learning phase. This better skill of the 3DSE remains throughout the 48-h forecast. Although the 3DSE RMSD progressively increases with time, it is still lower than the RMSD of individual models, EM and 0DSE after two days of forecast, with values lower than 0.4°C. Notice that the large discrepancy between NCOM outputs and observations is due to the lack of DA in this model for this particular experiment. Below the surface, where observations are limited, the 3DSE RMSD is also reduced during the learning period, but reaches higher values (0.8°C) during the forecast period. However, these values are still lower than the RMSD provided by the individual models, EM and 0DSE. The skill acquired during the 48 h of training also persists below the surface during the next 48 h.
 Finally, the 48-h BIAS, URMSD and CORR statistics are summarized in the target-like-diagram [Jolliff et al., 2009] shown in Figure 4. The x and y axes respectively represent the URMSD and the absolute value of the BIAS, whereas the dashed lines display the iso-RMSD contours. The marker type allows the distinction between surface and subsurface statistics and its color indicates the value of the CORR. In order to build this diagram, the whole set of observations during the forecast period is considered and results are averaged over the eight base times. At the surface (star markers), the 3DSE BIAS and URMSD values are respectively around 0.05 and 0.3°C, and so are the lowest among the individual models and 0DSE. EM presents a slightly reduced URMSD compared to 3DSE, but with a much larger BIAS. The 3DSE method also improves the correlation with surface observations, compared to the individual models and 0DSE. Below the surface (circle markers), 0DSE shows a slightly reduced BIAS compared to 3DSE, but the latter leads to reduced URMSD and improved correlation with subsurface observations. Altogether, the 3DSE improves the forecast skills compared to all individual models, the EM and the 0DSE, both at the surface and subsurface.
 A dynamically adapting three-dimensional super-ensemble (3DSE) method exploiting weight error spatial covariances has been applied for forecasting ocean temperature in a coastal area. Training and testing have been carried out with different data sets, consisting of SST analyses, CTD casts and gliders tracks. The difference Forecasts minus Observations along glider tracks has shown that the 3DSE can qualitatively better represent the temperature than the individual models, their ensemble mean (EM) and the single-weight-per-model super-ensemble (0DSE). Quantitatively, statistical results have shown a significant improvement of the forecasting skills when using the 3DSE. The new method presents the lowest root-mean-square difference (RMSD) against observations, with the reduction of both the BIAS and the unbiased RMSD, and an increased linear correlation coefficient (CORR) compared to individual models. With respect to the EM, surface and subsurface RMSD is respectively reduced by 57% and 35%, while the CORR remains similar. As these skills have been shown to persist during a 48-h period, the 3DSE constitutes a suitable post-processing tool for the improvement of operational forecasting. The method leads to an efficient reconciliation of different model forecasts into a single improved prediction, likely to be used in decision-making processes.
 Future developments of the technique might exploit numerical model covariances to further improve the spatial propagation of model weights updates into non-observed areas. Also, the multivariate capability of the 3DSE should be investigated.
 The OSTIA product has been derived from Met Office data, and this work has been carried out with the support of MERCATOR-OCÉAN, GNOO, NRL, NURC and Rutgers University. We also would like to thank A. Álvarez, G. Baldasserini, C. Lewis, G. Pennucci, and C. Trees for their help. This work was supported by the NURC and the French Community of Belgium (RACE, ARC-05/10-333). F. Lenartz and B. Mourre equally contributed to this work.