## 1. Introduction

[2] Within recent years, the use of model ensembles has become an important component in weather [*Bowler et al.*, 2008; *Du et al.*, 2009; *Houtekamer et al.*, 1996; *Hacker et al.*, 2011; *Stensrud et al.*, 2010], air quality [*McKeen et al.*, 2005; *Pagowski et al.*, 2005, 2006; *Pagowski and Grell*, 2006; *Mallet and Sportisse*, 2006; *Delle Monache et al.*, 2006a, 2006c, 2006b; *Zhang et al.*, 2007; *Vautard et al.*, 2009] and atmospheric dispersion predictions [*Galmarini et al.*, 2001; *Warner et al.*, 2002; *Draxler*, 2011; *Lee et al.*, 2009; *Kolczynski et al.*, 2009]. Moreover, recent efforts demonstrated the superior performance of multimodel ensembles (comprising multiple runs of different numerical prediction models, which differ in the input initial and/or boundary conditions and the numerical representation of the atmosphere) in weather [*Krishnamurti et al.*, 2009; *Bougeault et al.*, 2011], climate [*Krishnamurti et al.*, 2000] and atmospheric dispersion modeling [*Galmarini et al.*, 2004a; *Riccio et al.*, 2007; *Potempski et al.*, 2008].

[3] *Galmarini et al.* [2004a] proposed the so-called ‘Median Model’, defined as a new model constructed from the median of model predictions, to combine multimodel ensemble results. They demonstrated that the Median Model outperformed the results of any single deterministic model in reproducing the concentration of atmospheric pollutants measured during the ETEX experiment [*Girardi et al.*, 1998]. Moreover, a theoretical framework for determining optimal combinations of multimodel ensemble results has been set by *Potempski and Galmarini* [2009]. Computationally advanced and probabilistic ensemble weather forecasting approaches have been developed by other authors [*Ascione et al.*, 2006; *Montella et al.*, 2007; *Riccio et al.*, 2007; *Di Narzo and Cocchi*, 2010; *Fortin et al.*, 2006; *Katz and Ehrendorfer*, 2006; *Potempski et al.*, 2010].

[4] In the case of multimodel ensemble atmospheric dispersion modeling, different models are certainly more or less dependent, since they often share, among other features, initial/boundary data, numerical methods, parameterizations and emissions. Thus, model independence cannot be assumed for multimodel ensembles. We point out two potential consequences of inter-dependency: (1) results obtained by ensemble analysis may lead to erroneous interpretations since models could provide a wrong answer, and this is more probable if models are strongly dependent; and (2) as in time series analysis, where serial correlation reduces the *effective time series length* [*Bartlett*, 1935; *Thiébaux and Zwiers*, 1984], in a multimodel approach the *effective number of models* may be lower than the total number, since models could be linearly, or nonlinearly, dependent on each other. The practical effects of model inter-dependency can be highlighted by the analysis of ETEX data [*Girardi et al.*, 1998]. ETEX-1 and ETEX-2 (the first and second European Tracer EXperiment) took place in 1994 and allowed the comparison of several types of atmospheric dispersion models against observed concentrations. *Galmarini et al.* [2004a, 2004b] noted that the Median Model usually resulted to be superior to any single model in reproducing the measured concentrations of the ETEX-1 experiment. Table 1 shows the root mean square error, correlation coefficient, FA2, FA5 and FOEX indexes of each model for the ETEX-1 experiment. FA2 and FA5 give the percentage of model results within a factor of 2 and 5, respectively, of the corresponding measured value, while FOEX is the percentage of modeled concentration values that overestimate (positive) or underestimate (negative) the corresponding measurement. The Median Model results averaged over models m01-m16, following *Galmarini et al.* [2004b], and over all available models are also shown in the last two rows. *Potempski and Galmarini* [2009] showed that the mean of any *m*-member ensemble has lower RMSE than any single model if the ratio between the highest and lowest variance is less than *m* + 1. This theoretical insight cannot be applied to models reported in Table 1, since several statistical constraints should be satisfied, such as model independence; however, Median Model results from Table 1 shows a RMSE lower than the majority of single models.

RMSE | CC | FA2 | FA5 | FOEX | |
---|---|---|---|---|---|

- a
The last column (FOEX) gives the percentage of over-prediction (>0) or under-predictions (<0). In the next to last row, Median Model results, averaged over models from m01 to m16 (following *Galmarini et al.*[2004b]) are shown. In the last row, the Median Model results averaged over all available models are shown. The RMSE dimensions are ng/m^{3}. Models labeled ‘Exp1’ or ‘Exp2’ are those selected by the two clustering procedures (see section 3.1 for details).
| |||||

m01 | 4.76 | 0.17 | 14.25 | 37.65 | 77 |

m02 | 0.71 | 0.30 | 22.00 | 45.91 | 61 |

m03 | 6.04 | 0.22 | 19.13 | 42.04 | 55 |

m04 | 7.4e8 | 0.17 | 0.00 | 0.00 | 100 |

m05 | 2.05 | 0.27 | 13.02 | 32.72 | 71 |

m06 | 7.56 | 0.17 | 22.91 | 47.37 | 2 |

m07 | 0.93 | 0.26 | 19.98 | 42.91 | 36 |

m08 | 0.72 | 0.23 | 8.11 | 18.08 | −42 |

m09 (Exp1) | 2.19 | 0.17 | 16.47 | 37.47 | 11 |

m10 (Exp1) | 1.81 | 0.41 | 15.11 | 35.32 | 17 |

m11 | 2.88 | 0.27 | 15.90 | 37.76 | 14 |

m12 | 2.27 | 0.26 | 21.00 | 42.43 | 34 |

m13 | 3.19 | 0.08 | 21.94 | 45.63 | 50 |

m14 | 3.06 | 0.13 | 12.34 | 28.35 | 56 |

m15 (Exp1) | 3.76 | 0.05 | 15.89 | 34.65 | 11 |

m16 | 8.53 | 0.08 | 21.97 | 44.39 | 36 |

m17 | 1.31 | 0.32 | 10.24 | 23.01 | 0 |

m18 | 2.89 | 0.20 | 17.61 | 37.82 | −4 |

m19 | 1.47 | 0.27 | 21.81 | 46.12 | 76 |

m20 (Exp2) | 0.45 | 0.08 | 20.24 | 46.64 | −8 |

m21 | 5.32 | 0.22 | 18.90 | 43.00 | 45 |

m22 (Exp1, Exp2) | 1.79 | 0.24 | 27.96 | 54.76 | 21 |

m23 (Exp2) | 0.53 | 0.24 | 11.33 | 26.32 | −28 |

m24 (Exp2) | 2.22 | 0.20 | 21.67 | 47.59 | 44 |

m25 | 3.27 | 0.24 | 22.93 | 46.64 | 50 |

m26 (Exp1) | 1.20 | 0.08 | 10.82 | 27.09 | −7 |

MM 1-16 | 1.30 | 0.29 | 24.14 | 48.38 | 15 |

MM 1-26 | 1.15 | 0.30 | 26.43 | 50.99 | 13 |

[5] We can have an insight into model data inter-dependency by looking at the same data from a different perspective: sort models in descending order using the RMSE as ordering criteria, and denote by {*r*_{i}}_{i=1,…,26} the permuted labels, so that *r*_{1} indicates the model with the highest RMSE, *r*_{2} the second highest, and so on. The median value can be recalculated at each spatiotemporal location using data from the first reordered *m* models, i.e. from {*r*_{j}}_{j=1,…,m}, with *m* ∈ {1, 2, …,26}. Of course there could be twenty-six ‘Median Models’, depending on how many models are included in the statistics. Figure 1 shows the RMSE of these Median Models.

[6] According to *Potempski and Galmarini* [2009], the mean square error decays asymptotically as , where *m* is the total number of models considered. Indeed, the RMSE in Figure 1 shows an asymptotically decreasing trend, in agreement with theoretical expectation. After an initial and fast decrease, the RMSE slowly converges to its limit. This means that better predictive capabilities are obtained at the expense of greater and greater efforts, as measured by the number of models; also, if models are selected according to a given criterion, then a drastic RMSE reduction can be achieved with a small number of models.

[7] The penalization of “more complex” hypotheses is a long-lasting approach in Bayesian inference, as elegantly expressed by the Ockham's razor: *entia non sunt multiplicanda praeter necessitatem* (entities should not be multiplied beyond necessity).

[8] These general considerations about complexity reduction raise some critical issues concerning the extraction of accurate and essential information from large ensembles: (1) How to represent ensemble results, building probabilistic forecasts whose performance are similar to those given by the whole data set, but using fewer models, and (2) how to select a subset of models with the minimum loss of performance.

[9] In this work we suggest that statistical information can be used as a guideline to reorganize multimodels ensemble data. The aim is to find an approach for the automatic classification of models that share similar features by using independence statistical measures. We will show how to exploit these statistics to build a dissimilarity matrix and cluster available models along a given set of axes. The most representative models of each cluster will be used to build a reduced ensemble data set from a few models, and we will show that the subset of model data, selected on uncorrelation and/or independence basis, performs as well as, or even better, than the whole data set.

[10] In section 2 we first summarize the most important theoretical properties concerning uncorrelatedness and independence; then, in section 3 we apply these concepts to the analysis of multimodel ensemble data from the ETEX-1 experiment. Conclusions are drawn in section 4.