The DEMETER multi-model ensemble system is used to investigate the rationale behind the multi-model concept. A comprehensive documentation of the differences in the single and multi-model performance in the DEMETER hindcast data set is given. Both deterministic and probabilistic diagnostics are used and a variety of analyses demonstrate the improvements achieved by using multi-model instead of single-model ensembles. In order to understand the reason behind the multi-model superiority, basic scenarios describing how the multi-model approach can improve over single-model skill are discussed. It is demonstrated that multi-model superiority is caused not only by error compensation but in particular by its greater consistency and reliability.