On the systematic reduction of data complexity in multimodel atmospheric dispersion ensemble modeling

Authors


Abstract

[1] The aim of this work is to explore the effectiveness of theoretical information approaches for the reduction of data complexity in multimodel ensemble systems. We first exploit a weak form of independence, i.e. uncorrelation, as a mechanism for detecting linear relationships. Then, stronger and more general forms of independence measure, such as mutual information, are used to investigate dependence structures for model selection. A distance matrix, measuring the interdependence between data, is derived for the investigated measures, with the scope of clustering correlated/dependent models together. Redundant information is discarded by selecting a few representative models from each cluster. We apply the clustering analysis in the context of atmospheric dispersion modeling, by using the ETEX-1 data set. We show how the selection of a small subset of models, according to uncorrelation or mutual information distance criteria, usually suffices to achieve a statistical performance comparable to, or even better than, that achieved from the whole ensemble data set, thus providing a simpler description of ensemble results without sacrificing accuracy.

1. Introduction

[2] Within recent years, the use of model ensembles has become an important component in weather [Bowler et al., 2008; Du et al., 2009; Houtekamer et al., 1996; Hacker et al., 2011; Stensrud et al., 2010], air quality [McKeen et al., 2005; Pagowski et al., 2005, 2006; Pagowski and Grell, 2006; Mallet and Sportisse, 2006; Delle Monache et al., 2006a, 2006c, 2006b; Zhang et al., 2007; Vautard et al., 2009] and atmospheric dispersion predictions [Galmarini et al., 2001; Warner et al., 2002; Draxler, 2011; Lee et al., 2009; Kolczynski et al., 2009]. Moreover, recent efforts demonstrated the superior performance of multimodel ensembles (comprising multiple runs of different numerical prediction models, which differ in the input initial and/or boundary conditions and the numerical representation of the atmosphere) in weather [Krishnamurti et al., 2009; Bougeault et al., 2011], climate [Krishnamurti et al., 2000] and atmospheric dispersion modeling [Galmarini et al., 2004a; Riccio et al., 2007; Potempski et al., 2008].

[3] Galmarini et al. [2004a] proposed the so-called ‘Median Model’, defined as a new model constructed from the median of model predictions, to combine multimodel ensemble results. They demonstrated that the Median Model outperformed the results of any single deterministic model in reproducing the concentration of atmospheric pollutants measured during the ETEX experiment [Girardi et al., 1998]. Moreover, a theoretical framework for determining optimal combinations of multimodel ensemble results has been set by Potempski and Galmarini [2009]. Computationally advanced and probabilistic ensemble weather forecasting approaches have been developed by other authors [Ascione et al., 2006; Montella et al., 2007; Riccio et al., 2007; Di Narzo and Cocchi, 2010; Fortin et al., 2006; Katz and Ehrendorfer, 2006; Potempski et al., 2010].

[4] In the case of multimodel ensemble atmospheric dispersion modeling, different models are certainly more or less dependent, since they often share, among other features, initial/boundary data, numerical methods, parameterizations and emissions. Thus, model independence cannot be assumed for multimodel ensembles. We point out two potential consequences of inter-dependency: (1) results obtained by ensemble analysis may lead to erroneous interpretations since models could provide a wrong answer, and this is more probable if models are strongly dependent; and (2) as in time series analysis, where serial correlation reduces the effective time series length [Bartlett, 1935; Thiébaux and Zwiers, 1984], in a multimodel approach the effective number of models may be lower than the total number, since models could be linearly, or nonlinearly, dependent on each other. The practical effects of model inter-dependency can be highlighted by the analysis of ETEX data [Girardi et al., 1998]. ETEX-1 and ETEX-2 (the first and second European Tracer EXperiment) took place in 1994 and allowed the comparison of several types of atmospheric dispersion models against observed concentrations. Galmarini et al. [2004a, 2004b] noted that the Median Model usually resulted to be superior to any single model in reproducing the measured concentrations of the ETEX-1 experiment. Table 1 shows the root mean square error, correlation coefficient, FA2, FA5 and FOEX indexes of each model for the ETEX-1 experiment. FA2 and FA5 give the percentage of model results within a factor of 2 and 5, respectively, of the corresponding measured value, while FOEX is the percentage of modeled concentration values that overestimate (positive) or underestimate (negative) the corresponding measurement. The Median Model results averaged over models m01-m16, following Galmarini et al. [2004b], and over all available models are also shown in the last two rows. Potempski and Galmarini [2009] showed that the mean of any m-member ensemble has lower RMSE than any single model if the ratio between the highest and lowest variance is less than m + 1. This theoretical insight cannot be applied to models reported in Table 1, since several statistical constraints should be satisfied, such as model independence; however, Median Model results from Table 1 shows a RMSE lower than the majority of single models.

Table 1. Root Mean Square Error (RMSE), Correlation Coefficient (CC) and Percentage of Coupled Measured-Modeled Data Within a Factor of 2 (FA2) and 5 (FA5) for the ETEX-1 Experimenta
 RMSECCFA2FA5FOEX
  • a

    The last column (FOEX) gives the percentage of over-prediction (>0) or under-predictions (<0). In the next to last row, Median Model results, averaged over models from m01 to m16 (following Galmarini et al. [2004b]) are shown. In the last row, the Median Model results averaged over all available models are shown. The RMSE dimensions are ng/m3. Models labeled ‘Exp1’ or ‘Exp2’ are those selected by the two clustering procedures (see section 3.1 for details).

m014.760.1714.2537.6577
m020.710.3022.0045.9161
m036.040.2219.1342.0455
m047.4e80.170.000.00100
m052.050.2713.0232.7271
m067.560.1722.9147.372
m070.930.2619.9842.9136
m080.720.238.1118.08−42
m09 (Exp1)2.190.1716.4737.4711
m10 (Exp1)1.810.4115.1135.3217
m112.880.2715.9037.7614
m122.270.2621.0042.4334
m133.190.0821.9445.6350
m143.060.1312.3428.3556
m15 (Exp1)3.760.0515.8934.6511
m168.530.0821.9744.3936
m171.310.3210.2423.010
m182.890.2017.6137.82−4
m191.470.2721.8146.1276
m20 (Exp2)0.450.0820.2446.64−8
m215.320.2218.9043.0045
m22 (Exp1, Exp2)1.790.2427.9654.7621
m23 (Exp2)0.530.2411.3326.32−28
m24 (Exp2)2.220.2021.6747.5944
m253.270.2422.9346.6450
m26 (Exp1)1.200.0810.8227.09−7
 
MM 1-161.300.2924.1448.3815
MM 1-261.150.3026.4350.9913

[5] We can have an insight into model data inter-dependency by looking at the same data from a different perspective: sort models in descending order using the RMSE as ordering criteria, and denote by {ri}i=1,…,26 the permuted labels, so that r1 indicates the model with the highest RMSE, r2 the second highest, and so on. The median value can be recalculated at each spatiotemporal location using data from the first reordered m models, i.e. from {rj}j=1,…,m, with m ∈ {1, 2, …,26}. Of course there could be twenty-six ‘Median Models’, depending on how many models are included in the statistics. Figure 1 shows the RMSE of these Median Models.

Figure 1.

RMSE errors of ‘Median Models’ using ETEX-1 data. The mth Median Model results were calculated using data from {ri}i=1,…,m. ri denotes the ith model rank, i.e. the model with the ith highest RMSE. Results from {r1} and {ri}i=1,2 are not shown because they are much greater than 5 ng/m3.

[6] According to Potempski and Galmarini [2009], the mean square error decays asymptotically as inline image, where m is the total number of models considered. Indeed, the RMSE in Figure 1 shows an asymptotically decreasing trend, in agreement with theoretical expectation. After an initial and fast decrease, the RMSE slowly converges to its limit. This means that better predictive capabilities are obtained at the expense of greater and greater efforts, as measured by the number of models; also, if models are selected according to a given criterion, then a drastic RMSE reduction can be achieved with a small number of models.

[7] The penalization of “more complex” hypotheses is a long-lasting approach in Bayesian inference, as elegantly expressed by the Ockham's razor: entia non sunt multiplicanda praeter necessitatem (entities should not be multiplied beyond necessity).

[8] These general considerations about complexity reduction raise some critical issues concerning the extraction of accurate and essential information from large ensembles: (1) How to represent ensemble results, building probabilistic forecasts whose performance are similar to those given by the whole data set, but using fewer models, and (2) how to select a subset of models with the minimum loss of performance.

[9] In this work we suggest that statistical information can be used as a guideline to reorganize multimodels ensemble data. The aim is to find an approach for the automatic classification of models that share similar features by using independence statistical measures. We will show how to exploit these statistics to build a dissimilarity matrix and cluster available models along a given set of axes. The most representative models of each cluster will be used to build a reduced ensemble data set from a few models, and we will show that the subset of model data, selected on uncorrelation and/or independence basis, performs as well as, or even better, than the whole data set.

[10] In section 2 we first summarize the most important theoretical properties concerning uncorrelatedness and independence; then, in section 3 we apply these concepts to the analysis of multimodel ensemble data from the ETEX-1 experiment. Conclusions are drawn in section 4.

2. Theoretical Background

[11] We remark that the main issue in this work is to introduce a new approach to select a subset of models and discard redundant information. Mainly, we consider correlation and independence information. These concepts have profound implications in the physical [Dong and Zhou, 2010; Peleg et al., 2010] and information science fields [Cover and Thomas, 2006; Papoulis, 1991]. We shortly describe these concepts for efficiently combine multimodel ensemble data.

2.1. Uncorrelatedness and Independence

[12] Assume that we have m different models, whose predictions are available at a predefined space-time lattice, and assume that ξk and ξl are the results of the kth and lth model, respectively.

[13] Due to the unavoidable model uncertainties, it is customary to consider ξk as the outcome from a random process. Let ξk written as a n-dimensional column vector

display math

where n is the total number of space-time lattice points at which results from the kth model are available, and T denotes the transpose, and let

display math

the (k, l)th element of the m × m covariance matrix, where μ is the vector of expected values over the multimodel ensemble space at the n space-time lattice points, and E{·} denotes expectation.

[14] Models are said to be uncorrelated if their m × m covariance matrix is diagonal, i.e. if C(k,l) ≡ 0 for any kl.

[15] A key concept that we use in the following is that of statistical independence of models [Cover and Thomas, 2006; Papoulis, 1991], which can be rigorously defined in terms of probability densities. Given two models, k and l, and their joint density function, pk,l(ξkξl), the marginal densities, pk(ξk) and pl(ξl), are obtained by integrating over the other random vector in their joint density, i.e.:

display math
display math

where the variables ξl and ξk are defined on the domains image and image respectively. The two models, k and l, are said to be independent if and only if

display math

In other words, the joint density of model data, pk,l(ξk, ξl) must factorize into the product of two marginal densities, pk(ξk) and pl(ξl), each depending on only one model. It can be shown [Cover and Thomas, 2006; Papoulis, 1991] that equation (5), defining statistical independence, implies uncorrelatedness. The reverse is not always true: uncorrelated data are not necessarily independent. However, if the random variables have Gaussian distributions, uncorrelatedness implies independence, though this property is not shared by other distributions in general.

2.2. Mutual Information

[16] In probability and information theory, the Mutual information (MI) of two random variables is a quantity that measures the mutual dependence of the two variables. To explain this concept we start defining the differential entropy H [Cover and Thomas, 2006; Papoulis, 1991] of a random variable ξk with density p(·) as

display math

Analogously, the joint entropy between two different distributions is

display math

where p(ξkξl) is the joint probability density function, and p(ξk) and p(ξl) are the marginal probability density functions of ξk and ξl, respectively.

[17] Formally, the MI of two continuous random variables ξk and ξl can be defined as

display math

MI is a measure of dependence in the following sense: I(ξkξl) = 0 if and only if ξk and ξl are independent random variables. On the other hand, if ξk and ξl are the same, MI coincides with the entropy of ξk, i.e. all information conveyed by ξk is shared with ξl. This means that knowing ξk exactly determines the value of ξl and vice versa, and ξk and ξl have the same entropy. MI is nonnegative (I(ξkξl) ≥ 0) and symmetric, i.e. I(ξkξl) = I(ξlξk) [Cover and Thomas, 2006; Papoulis, 1991].

2.3. Metric

[18] In order to compare multimodel ensemble data, we need a function to measure the ‘distance’ between models. We use two different measures of distance.

[19] 1. In the first case, we measure the distance between two models, k and l, by means of their Pearson (or correlation) coefficient; we define the uncorrelation distance as

display math

CC(k,l) is the correlation coefficient between models k and l. C(k,l) is the covariance between model k and l. Each element of dCC varies between 0 (perfectly correlated models) and 2 (perfectly anticorrelated models). Two models are said to be distant (uncorrelated) if their co-variability is close to zero, and are close if they are highly correlated.

[20] 2. In the second case the distance is based on a normalized variant of mutual information

display math

[21] Definitions (9) and (10) satisfy some of the basic properties of a metric, such as non-negativity and symmetry; moreover (10) satisfies triangle inequality and indiscernibility [Hattori, 2003].

[22] Mutual information-based analyses has been revealed as a flexible and general approach and has been used in ecological [Maier et al., 2006], bioinformatics [Priness et al., 2007; Li et al., 2001; Kraskov and Grassberger, 2009], text mining [Dhillon et al., 2003], computational chemistry [Hamacher, 2007] and information science [Sotoca and Pla, 2010] fields, among many others. For the practical calculation of joint entropy and mutual information we exploited the minimal-redundancy-maximal-relevance (mRMR) framework described by Peng et al. [2005].

[23] Finally, we want to remark the type of dependencies equations (9) and (10) can detect. The correlation distance is able to recognize only linear relationships between dependent data; for example, suppose that a random variable x is symmetrically distributed about zero, e.g. y = x2. Then y is completely determined by x, so that x and y are perfectly dependent, but their correlation is zero. There are more robust (and more sensitive to nonlinear relationships) definitions of correlation distance [e.g., see Székely et al., 2007]), but they are much more computationally expensive, even for moderate sample size as in the case of the ETEX-1 database, than the standard correlation coefficient. However, as already remarked, we are concerned with deterministic models, so that it is highly probable that they share strong linear relationships, and even the simple correlation coefficient can detect mutual dependencies and work satisfactorily, as we show in section 3.

2.4. Agglomerative Approach

[24] Having defined a distance between models, i.e. a distance matrix whose elements are the coefficients in equations (9) or (10), models can be classified using an agglomerative approach. The aim is to build a hierarchical tree (dendrogram) [Amato et al., 2006; Napolitano et al., 2008; Ciaramella et al., 2008, 2009] that clusters the distributions obtained from the ensemble data by means of an unsupervised method: models that produce similar data are clustered together, while dissimilar models, i.e. models that produce uncorrelated/independent data, are agglomerated into different clusters. In this way we can cluster models into uncorrelated/independent groups and select a few models that can be considered representative of each group. The representative models are defined as the models, one from each cluster, whose data are closest (in the mean square deviation sense) to the centroid of the cluster it belongs to.

3. Results

[25] In this section we show the results obtained by applying the proposed approach to analyze the multimodel ensemble data of the ETEX-1 experiment.

[26] As part of the ETEX-1 experiment, 340 kg of perfluoromethylcyclohexane, C7F14, were released on 23 October 1994 at 16:00 UTC (T0) from a stack in Monterfil, southeast of Rennes (France). The weather conditions have been described in detail by Girardi et al. [1998]. Briefly, a steady westerly flow of unstable air masses was present over central Europe. Such conditions persisted for the 90 hours that followed the release with frequent precipitation events over the advection area and a slow movement toward the North Sea region. Concurrently, this passive tracer was recorded at more than 150 sites all across Europe every three hours; the reader may refer to Girardi et al. [1998] and Nodop et al. [1998] for details on the sampling and analysis technique used during the experiment.

[27] Several independent groups worldwide tried to forecast these observations. Each simulation, and therefore each ensemble member, is produced with different atmospheric dispersion models and is based on weather fields generated by (most of the time) different Global Circulation Models (GCM). For details on the groups involved in the exercise and the model characteristics refer to Galmarini et al. [2004b]. Ten additional sets [Riccio et al., 2007] are available for this analysis. These include one set of results from the Danish Meteorological office (DMI), one set from the Korean Atomic Energy Agency, three sets from the Finnish met service (FMI), one set from UK-Met Office, three sets from Meteo-France and another set from MeteoSwiss.

[28] Galmarini et al.'s [2004b] analysis of model results showed that the single models produced a wide spectrum of different time evolution concentration fields; while the Median Model provided a more accurate reproduction of the concentration trend and persistence at sampling locations. In the following analysis, we discard results from model m04, since it is clearly affected by overestimation problems (look at Table 1), due to an erroneous release rate. All other simulations relate to the same release conditions.

3.1. Cluster Analysis

[29] We remark that the aim of our analysis is to find the models that share similar features, by using both correlation and mutual information statistical measures. Given the model output and the definition of inter-model distances (equations (9) or (10)), the dissimilarity matrix is determined, and the clustering process proceeds without any other information. Here, we show how to exploit the dissimilarity matrix for the systematic selection of a few models.

[30] Figure 2 presents the dendrogram plots of the hierarchical cluster tree generated by the complete linkage agglomerative mechanism [Wilks, 2006] where the distance between distributions is calculated according to equations (9) or (10). The height of each inverted U-shaped line represents the distance between the two clusters being connected. We can horizontally cut the dendrogram at the smallest height intersecting the tree at m vertical lines, thus selecting m groups of models and satisfying the desired ‘complexity reduction’. The selection of the cutting height is partially arbitrary; it is customary to cut the dendrogram at the height where the distance from next clustered groups is relatively large, and the retained number of clusters is small compared to the original number of models. From Figure 2 the uncorrelation-based dendrogram shows relatively greater jumps after a height of about 0.55, defining five groups of models; for the MI-based dendrogram there is not a clear-cutting height; in the following we show the results obtained by cutting the dendrogram based on the normalized MI distance at a height of 0.94. This choice further reduces the number of clusters, since only four clusters are retained. The models associated with each cluster are highlighted with different colors in Figure 2.

Figure 2.

Dendrograms obtained using the dissimilarity matrix based on (top) the uncorrelation and (bottom) the mutual information distance. On the y-axis the distance between the two clusters being connected can be read. Colors indicate models grouped to different clusters (see the text for details).

[31] Next, the most representative models of each cluster are used to build a reduced ensemble data set. From each cluster, we pick one representative model; we choose, as selection criteria, the model closest to the cluster centroid, where the cluster centroid is defined as the mean among all models associated to that cluster. Table 1 labels the representative models selected on the basis of uncorrelation distance with the tag ‘Exp1’, and the models selected on the basis of mutual information with the tag ‘Exp2’.

[32] It should be emphasized that the analysis described so far can be done without the information of any experimental data, that is the dendrograms in Figure 2 can be reconstructed based on the inter-relation and mutual information among models alone. This is a desirable feature in emergency response applications, if the decision-making process is solely based on the fast inter-comparison of model results, with no guarantee that a particular model corresponds to a correct representation of the evolution of the dispersion process: an immediate feedback on the mutual differences between models can effectively support the decision-making process. Moreover, the simultaneous comparison of all models is complicated from a practical point of view, while the selection of a small subset can quickly capture the most relevant features of plume dispersion and estimate the agreement among models.

[33] Table 1 shows the individual members of each subset. As can be inferred from the analysis of this Table, the statistics of representative models span over a wide interval of different values; for example, the spatiotemporally averaged RMSE can be as large as 3.76 ng/m3 (model m15), or as low as 1.79 ng/m3 (model m22), both included in the Exp1 subset. The same can also be said for the other statistical indexes. Similar considerations also hold for the models in the Exp2 subset.

3.2. The Comparison of Exp1 and Exp2 Models With Full Ensemble Model Results

[34] Exp1 and Exp2 models define two new ensembles, whose performances can be calculated (Table 2) and compared with the Median Model results in Table 1. Interestingly, the dissimilar performances of Exp1 and Exp2 models are reconciled when they are combined together. Also, it can be inferred that the average over only Exp1 or Exp2 models does not necessarily correspond to a loss of performance; in many cases the statistical indexes indicate an even better performance. Since the selection methodology is not based on the prior knowledge of experimental values, this satisfactory comparison suggests promising perspectives for the systematic reduction of ensemble data complexity.

Table 2. Root Mean Square Error (RMSE), Correlation Coefficient (CC), FA2, FA5 and FOEX Indexes Based on the Ensembles Made By the Subset of Exp1 and Exp2 Modelsa
 RMSECCFA2FA5FOEX
  • a

    The RMSE dimensions are ng/m3.

Exp11.040.2421.9544.44−1
Exp20.870.2327.4754.85−7

[35] Galmarini et al. [2004a] defined the APLp(x, y, t) as the pth percentile from available models at a specific space-time lattice point (x, y, t). Galmarini et al. [2004a] showed that the APL50, i.e. the Median Model results, compares favorably with ETEX-1 experimental data, with ‘hot spots’ correctly reproduced and located, usually better than any single model. Figures 3 and 4 show the results of the ETEX measurement and agreement in percentile level (APL50) for surface air concentration at 24, 48 and 60 hours from release (middle panel). The spatial distribution of the low concentration values remarkably resembles the measured one.

Figure 3.

Surface concentration from (left) observations and (right) 50th APL using all models adapted from Galmarini et al. [2004b], at (top) T0+24, (middle) T0+48 and (bottom) T0+60.

Figure 4.

The 50th APL from the subset of models selected on the basis of (left) uncorrelation and (right) mutual information criteria at (top) T0+24, (middle) T0+48 and (bottom) T0+60.

[36] One can suppose that the selection of a subset of ensemble models may be risky. However, the APL50 calculated over all models (in Figure 3) and the APL50 calculated over Exp1 and Exp2 models (in Figure 4) resemble very similar patterns, but the plots in Figure 4 have been obtained using five (in the case of uncorrelation) and four (in the case of mutual information) selected models: the selection of a subset of models does not necessarily deteriorate the graphical comparison of ensemble data with experimental ones, and the uncorrelation and mutual information criteria work well, at least with ETEX-1 data.

[37] Galmarini et al. [2004b] commented that the Median Model results seem to be more conservative than those produced by single models, since they cut the upper and lower tails of ‘model outliers’, and this corresponds to a better area of superposition (i.e. the fraction of model results for which both the modeled and observed concentrations correspond to a value greater than a given threshold). This empirical evidence also holds for Exp1 and Exp2 models; in Table 3 the area of superposition is reported for Exp1 and Exp2 models. The area of superposition of these models covers a wide range of different values, depending on the model and forecasting time. It can be verified that the area of superposition of the Median Model (column ‘MM’ of Table 3) is always larger than the average value of any single model in Table 3. More interestingly, the model selection procedure does not degrade this statistical index; in many cases the median over only Exp1 and Exp2 models corresponds to an area of superposition even larger than that of the Median Model. Also, it is worth noticing that the area of superposition is rather low for some single models (e.g. see model m20 at late hours), but the medians over Exp1 and Exp2 models, even if based on a small number of selected members, are able to circumvent this problem, and restore results comparable with those obtained by the Median Model. Figure 5 graphically demonstrates this feature. Contour lines are drawn for a concentration of 0.1 ng/m3. Both blue (Exp1 median) and yellow (Exp2 median) areas nicely superpose to that corresponding to observations (red line); moreover Exp1 and Exp2 areas of superposition closely resemble that corresponding to Median Model results (green line).

Table 3. Area of Superposition for the Selected Representative Models at Different Timesa
Timem09m10m15m20m22m23m24m26MMExp1Exp2
  • a

    ‘MM’ indicates the performance of the Median Model; ‘Exp1’ the performance of the median based on the subset of Exp1 models; ‘Exp2’ the corresponding performance based on Exp2 models. Experimental values greater than 0.1 ng/m3 has been considered, in order to select only significant concentrations.

T0+061.000.410.750.370.890.270.620.420.730.890.57
T0+120.600.330.350.260.480.220.570.270.410.480.37
T0+180.390.350.240.480.470.330.450.220.410.390.47
T0+240.360.350.320.290.450.310.410.210.350.360.43
T0+300.370.350.310.240.420.250.650.190.400.410.46
T0+360.350.390.290.520.420.270.420.280.430.420.41
T0+420.450.360.350.250.530.420.490.070.490.540.56
T0+480.580.490.470.090.690.440.650.320.640.690.79
T0+540.500.490.250.000.570.400.410.430.600.580.57
T0+600.460.480.320.000.640.220.450.610.590.600.60
Figure 5.

Area of superposition between observations (red line), Median Model (green), median of Exp1 models (blue line) and median of Exp2 models (yellow line) at (top left) T0+24, (top right) T0+48 and (bottom) T0+60. Contour lines are drawn for a concentration of 0.1 ng/m3.

[38] Finally, we checked the robustness of the selection procedure with the following experiment. Previous results have been obtained using all available data to select Exp1 or Exp2 models. We introduce here a ‘learning time’ with the scope of reconstructing the dendrograms, i.e. we checked to what extent the selection of representative models is influenced by the length of time window used to reconstruct the dendrogram. To this end, we shortened the time window and applied the hierarchical clustering procedure using only data within a shorter initial time interval. Results are summarized in Table 4, showing the performance of representative models when the selection is based on shortened time intervals (i.e. taking into consideration only data within the first twelve, twenty-four, etc., hours from release). It can be noted that the selection methodology so outlined produces comparable results even in case of a narrow time window. In all cases the repartition into uncorrelated/independent subsets is able to pick a few representative models with performance comparable to that of the whole data set, even using very limited time windows.

Table 4. Root Mean Square Error (RMSE), Correlation Coefficient (CC), FA2, FA5 and FOEX Indexes for the Medians of Representative Models Selected on the Correlation or Mutual Information Criteria as a Function of Learning Timea
 RMSECCFA2FA5FOEX
  • a

    The time interval used to select the subset of uncorrelated/independent models is shown in the first column. The first five rows refer to the performance of models selected on the uncorrelation distance, while the last five columns to those selected on mutual information distance. The statistical indexes have been calculated over all available data. The RMSE dimensions are ng/m3.

From T0+00 to T0+121.040.2421.9544.44−1
From T0+00 to T0+240.950.2021.4142.77−6
From T0+00 to T0+361.370.2820.3643.056
From T0+00 to T0+481.160.2820.8344.2616
From T0+00 to T0+601.040.2421.9544.44−1
From T0+00 to T0+120.820.2720.3242.69−19
From T0+00 to T0+240.730.2119.5441.25−18
From T0+00 to T0+361.170.2226.6154.46−9
From T0+00 to T0+480.970.2219.0442.82−20
From T0+00 to T0+600.870.2327.4754.85−7

4. Conclusions

[39] Within recent years, the use of model ensembles has become an important forecasting component. Rather than based on any consideration from a single deterministic forecast, ensemble prediction relies on many simulations from the same (or different) models. Ensemble analysis has been successfully applied in the fields of climate, weather, air quality and radionuclide atmospheric dispersion predictions. There are empirical and theoretical evidences that ensemble results are usually superior to that of single models.

[40] In the past ensemble approaches have been successfully applied to the ETEX-1 data set. In this work we apply data complexity reduction techniques to the same data set, demonstrating that a further decrease in the root mean square error and other statistics can be obtained without sacrificing accuracy. The data complexity reduction strategy has been based on the analysis of inter-correlation or inter-dependencies between models; the methods partition a distance matrix and reconstruct a hierarchical tree from which a few representative models can be selected.

[41] We empirically demonstrated that, at least the for ETEX-1 data set, a few representative models correctly reproduce the results obtained from the full analysis of ensemble data when they are compared with observations. The RMSE, APL and area of superposition show an even better performance in many cases. Also, the performance of selected models seems to be insensitive to the length of time window used to reconstruct the hierarchical dendrogram, and both uncorrelation and independence distances are able to select a subset of models whose data compare well with observations from the initial time of release.

[42] The possibility to carry out this data analysis without any prior information on observed data and the selection of a small number of models, whose data compares with observations as well as those based on the whole ensemble, suggests the effectiveness of applying these methods for data complexity reduction purposes.

[43] We also note that the successful application of theoretically based data reduction techniques has the potential to promptly highlight the distinct information from among the total population of ensemble members and may guide the system development process, targeting efforts to alternative models/configurations that better bound the envelope of uncertainty.

[44] Though these results cannot be considered as general, as they have been obtained by the analysis of a single data set, the positive results obtained exploiting well-known theoretical information criteria motivates further research in this direction. In the near future, it will be of great interest to apply data reduction techniques to more extended data sets, other than the ETEX-1, and test their robustness.

[45] To our knowledge, the data reduction techniques described in this work have been applied to multimodel ensemble atmospheric dispersion predictions for the first time. Very recently, Pennell and Reichler [2011] attempted to estimate the effective number of models in climate simulations, also using decorrelation-based techniques, and Johnson et al. [2011a, 2011b] applied hierarchical cluster techniques to analyze the precipitation fields from ensemble experiments, evidencing the interest and the general applicability of these techniques to other fields of geophysical interest.

Ancillary