## 1. Introduction

Using collective information for decision-making is common sense in both everyday life and professional business. In particular, the greater the complexity of the involved processes, the more input for our decision-making procedure can be helpful (Branzei et al., 2000). On the other hand, an overload of possibly contradictory information can lead to suboptimal decisions. It has been shown that in the real world of confusing and overwhelming information, fast and frugal heuristics (i.e. simple rules for making decisions) can be powerful tools that do surprisingly well (Gigerenzer and Todd, 1999). That is, in general decision-making theory, it is under debate whether more information leads to more success or whether ‘simplicity rules the world’.

The effect, that more information does not necessarily lead to more success, has been demonstrated for the case of weather forecasting by Heideman et al. (1993). Their results suggest that ‘the relation between information and skill in forecasting weather is complex’ and that ‘greater improvement in forecasting might be obtained by devoting resources to improving the use of information over and above those needed to increase the amount of information’. However, it is very important to note that this effect generally holds only in the case of an individual forecaster, making decisions based on different levels of information available to her or him. It must not be confused with the attempt to improve predictions by utilizing more than one decision-making system, of either subjective or objective nature. Many indications exist that such multiple decision-making systems (a group of forecasters/models) are generally superior to individual decision-making systems (a single forecaster/model).

For example, in short- and medium-range weather forecasting it has been demonstrated, in the early 1960s, that combining different forecasts from individual forecasters can be beneficial. Sanders (1963) analysed multiple-person forecasts and showed that ‘the group-mean probability forecast is found to be a more skilful statement than the probability forecast of the most skilled individual’. His early findings were confirmed by later studies (Sanders, 1973; Bosart, 1975; Gyakum, 1986), and the extension of this concept from subjective multiple forecasters to objective multi-model prediction systems has also been proven successful (Clemen and Murphy, 1986; Fraedrich and Leslie, 1987). Comparisons of multi-model and single-model performance suggest that ‘variations in model physics and numerics play a substantial role in generating the full spectrum of possible solutions’ (Fritsch et al., 2000).

However, using more than one model addresses only one of the two main sources of error. The second source of error, uncertainties in initial conditions, can be addressed by running an ensemble of forecasts from different initial conditions. This technique, known as ensemble prediction, is used with great success at forecasting centres around the world (Tracton and Kalnay, 1993; Molteni et al., 1996). Richardson (2000) has shown that probability forecasts derived from an ensemble prediction system (EPS) are of greater benefit than a deterministic forecast produced by the same model and that, for many users, the probability forecasts have more value than a shorter-range deterministic forecast.

In order to take into account both model error and uncertainties in initial conditions, the multi-model and ensemble techniques can be combined to a new approach, known as the multi-model ensemble concept (Harrison et al., 1995; Palmer and Shukla, 2000; Palmer et al., 2004). The idea of the superiority of multiple source prediction systems is based on the ‘incontrovertible fact that two or more inaccurate but independent predictions of the same future events may be combined in a very specific way to yield predictions that are, on the average, more accurate than either or any of them taken individually’ (Thompson, 1977). However, how ‘incontrovertible’ and widely accepted is this fact? Although many studies have demonstrated the success of the multi-model approach in practice, parts of the scientific community still dispute the general validity of the concept. Some of these reservations are caused by apparent misconceptions of the approach. Frequent questions in this context are the following.

- (i) How can a poor model add skill?
- (ii) How can the multi-model be better than the average single-model performance?
- (iii) Why not use the best single model instead of the multi-model?

In this paper we attempt to clarify such misconceptions in answering the above questions and discussing in general the rationale behind the multi-model ensemble concept.

The study is based on the extensive data set of seasonal hindcasts produced in the DEMETER project (Development of a European Multi-model Ensemble System for Seasonal to Interannual Prediction). DEMETER was conceived and funded as a European Union (EU) Framework-V project in order to advance the concept of multi-model ensemble prediction by installing a number of state-of-the-art global coupled ocean–atmosphere models on a single supercomputer, and to produce a series of six-month multi-model ensemble hindcasts with common archiving and common diagnostic software. A general description of the project, the involved coupled models, the produced data set, as well as the verification, downscaling and application of the data is given in Palmer et al. (2004). Here, the DEMETER data set is used to study specifically the advantages and limitations of the multi-model ensemble approach in seasonal forecasting.

In its simplest form, a multi-model ensemble forecast is produced by simply merging the individual forecasts with equal weights. However, more complex methods of optimally combining the single-model output have been described (Krishnamurti et al., 1999; Pavan and Doblas-Reyes, 2000; Rajagopalan et al., 2002). A substantial amount of effort has been concentrated on assessing the performance of sophisticated techniques for constructing optimal multi-model ensembles. However, neither a comprehensive demonstration of the superiority of the multi-model approach for seasonal forecasting nor any substantial work on the rationale behind its success can be found in the literature. Motivated by this lack of groundwork, a comprehensive documentation of the improved performance of a multi-model ensemble system compared to single-model ensemble predictions will be presented here, and also an explanation for the multi-model superiority is proposed.

In order to illustrate the improvements found when using an ‘equal-weight’ multi-model ensemble forecast system separately from the likely improvements expected in optimal multi-model ensembles, the paper is split into two parts. In the first part, the basic concept of the multi-model approach is discussed along with results from the equal-weight multi-model ensemble, henceforth referred to as the simple multi-model ensemble. All issues related to advanced methods for calibrating and optimally combining models will be addressed in the second part of the paper. A careful examination of simple multi-model ensembles results is additionally motivated by the the following facts: (i) robust optimal weights are difficult to calculate given the short samples available to train the models (Kharin and Zwiers, 2002; Peng et al., 2002), i.e. often the use of the simple multi-model is the only practical way of utilizing the multi-model approach; (ii) simple multi-model ensembles may be considered as a reference method for optimal multi-model ensemble systems.

A description of the data set and diagnostic tools used can be found in Section 2. In order to document the multi-model superiority, a comprehensive comparison of simple multi-model versus single-model results is presented in Section 3. A discussion of the rationale behind the superiority of the multi-model follows in Section 4, including some theoretical considerations and practical examples. The conclusions are summarized in Section 5.