## 1. Introduction

[2] Estimation of the frequency of extreme events is often required in the hydrological practice. The procedures for the analysis of a single set of data are well-established, but often observations of the same variable at different measuring sites are available, and more accurate conclusions can be reached by analyzing many data samples together. This constitutes the basis for regional frequency analysis [e.g., *Hosking and Wallis*, 1997]. Critical points of the regional approach to frequency analysis are in the choice of the method to group the data samples together, and in the assessment of the plausibility of the obtained groupings. This involves testing whether the proposed regions may be considered homogeneous or not. The hypothesis of homogeneity implies that frequency distributions for different sites are the same, except for a site-specific scale factor.

[3] Many authors have proposed homogeneity tests in the hydrologic literature, including *Dalrymple* [1960], *Wiltshire* [1986a, 1986b, 1986c], *Chowdhury et al.* [1991], *Lu and Stedinger* [1992], *Fill and Stedinger* [1995], and *Hosking and Wallis* [1993, 1997]. However, few comparisons have been carried out between the tests, with the effect of leaving the user without clear ideas regarding the merits and drawbacks of each method. *L* moments based statistics [*Hosking and Wallis*, 1993, 1997] are nowadays routinely used in regional analyzes, but no detailed studies are available that demonstrate their superiority toward other methods. Here we compare, in a very general setting, four homogeneity tests: the first two tests, proposed by *Hosking and Wallis* [1993], are based on *L* moments statistics. The other considered tests are novel in the hydrologic field: these are the *k* sample Anderson-Darling test [*Scholz and Stephens*, 1987], opportunely modified to account for the normalization by the index value, and the *Durbin and Knott* [1971] test, routinely used as a goodness of fit test but adopted here for the heterogeneity assessment.

[4] The performances of these tests are assessed through the determination of their power with Monte Carlo simulation experiments. Having a more powerful homogeneity test implies that there is the potential to reduce the error of quantile estimators, that is the final goal of a Regional Frequency Analysis. However, this is the case only when the significance level is selected which maximizes the benefits of having a more powerful homogeneity test. A full analysis of the problem would require to disentangle the relations between the significance level and the power of the tests, which is in turn a very complicated problem, that goes beyond the scope of the present manuscript. Additional considerations on this topic are found at http://www.idrologia.polito.it/∼alviglio/homtest.htm

[5] Section 2 is devoted to the description of the considered tests. In section 3 we describe the procedure adopted for carrying out the comparison among the tests, in section 4 the obtained results are presented, and in section 5 some conclusions are drawn.