Water Resources Research

Analytical confidence intervals for index flow flow duration curves

Authors


Abstract

[1] This study illustrates how analytical period-of-record flow duration curves (FDCs) and annual FDCs (AFDCs) of daily streamflow series developed in an index flow framework can be complemented by theoretical confidence intervals (CIs) deduced from the theory of nonparametric CIs for quantiles and fractional order statistics. By focusing on AFDCs, the proposed approach yields results very close to CIs for order statistics which are commonly used to construct AFDCs and CIs for AFDCs. When the method is applied to index flow FDCs, the comparison with Monte Carlo techniques allows the elucidation of some properties of FDCs which have not been previously explored in depth. The approach helps to overcome the problem of lack of CIs for index flow FDCs by introducing approximate analytical CIs based on an effective sample size. Thus, the underlying idea is emphasized that CIs of index flow FDCs and AFDCs can be coherently obtained by reasoning in terms of distribution of quantiles rather than distribution of order statistics. Moreover, a few results taken from nonparametric statistics allow the introduction of semiparametric index flow FDCs and AFDCs which are potentially useful for parsimonious regionalization procedures.

1. Introduction

[2] The percentage of time for which the given streamflow was equaled or exceeded over a historical period can be estimated by the so-called period-of-record flow duration curves (FDCs). FDCs are the survival functions of streamflows with a given time resolution (e.g., daily, weekly, or monthly) [Vogel and Fennessey, 1994]. Recalling that quantiles xP are values of a variable X (e.g., discharge) exceeded with a fixed probability P, an FDC can be described simply as a plot of xP versus P, where P is given by the complement of the distribution function F of X: P = 1 − F(XxP).

[3] FDCs are widely used in a number of applications [e.g., Smakhtin, 2001]. However, their use is often criticized as their interpretation depends on the period of record, and also, no procedures for computing theoretical CIs are available [e.g., Vogel and Fennessey, 1994, 1995]. To overcome these drawbacks, Vogel and Fennessey [1994] suggested reinterpreting FDCs on an annual base by considering N annual FDCs (AFDCs), each corresponding to one of the N years of data. For daily data, each curve is a sequence of n = 365 values Xi, with i = 1,..,n, arranged in ascending order X1:nX2:n ≤ .. ≤ Xn:n, where Xi:n is the ith order statistic [e.g., Kottegoda and Rosso 2008]. AFDCs summarize the distribution functions of the n order statistics Xi:n from the annual minima X1:n to the annual maxima Xn:n. Taking the median (or average) of the N values available for each Xi:n, it is possible to build a median (or average) AFDC, which represents a typical year wherein the interpretation is not affected by abnormal observations during the period of records [Vogel and Fennessey, 1994]. Moreover, other percentiles as well as the median can be taken into account to provide α percentiles of ACDFs which can be used for constructing CIs for the median [e.g., Castellarin et al., 2007].

[4] To develop a mathematical model of the relationship between FDCs and AFDCs for obtaining regional models of AFDC to be applied at ungauged sites, Castellarin et al. [2004a] adopted an index flow method similar to that used in regional flood frequency analysis [e.g., Dalrymple, 1960; Hosking and Wallis, 1997]. The basic assumption of the model is that the daily streamflow X is the product of two random variables:

equation image

where the index flow Q summarizes the interannual precipitation variability and is often assumed to be the mean annual flow, whereas the dimensionless streamflow X′ and its distribution function FX describe the hydrologic behavior of the basin. Under the hypothesis of independence of Q and X′, the distribution function of X can be written as [e.g., Kottegoda and Rosso, 2008, pp. 137–139]

equation image

where fX is the probability density function of X′ and FQ is the distribution function of Q. The integral is computed over the domain Ω of the variable X′, and the theoretical FDC is given by 1-F(x). The same index flow representation holds for the order statistics, which are simply observations arranged in ascending (or descending) order. Thus, the ith order statistic Xi:n can be expressed as the product

equation image

Irrespective of the serial correlation of daily streamflow observations, Castellarin et al. [2004a] assume that the dimensionless daily discharges X′ are independent and identically distributed (iid), as this hypothesis does not influence FDCs and AFDCs. Moreover, under iid assumption, the theoretical distribution of Xi:n′ can be deduced by the distribution of X′ as follows:

equation image

where I(z; a, b) = ∫0zta−1(1 − t)b−1 dt/B(a, b) is the beta cumulative distribution function. From equations (2), (3), and (4), it follows that the distribution function of Xi:n is formally similar to the distribution of X:

equation image

where image is the probability density function of Xi:n′. The percentiles of the AFDCs corresponding to a given probability α can be obtained by inverting equation (5) for i = 1,..,n.

[5] The above mentioned framework provides a method for linking FDCs and AFDCs. However, this model yields CIs only for AFDCs and not for FDCs. Alternatively, unlike AFDCs, FDCs can be used directly for filling gaps and for extending daily streamflow series or generating streamflow series at ungauged river basins [Castellarin et al., 2004b]. Thus, an ideal model should provide FDCs which are useful for simulation and are complemented with CIs. The theory of nonparametric CIs for quantiles and fractional order statistics allows the introduction of analytical CIs for FDCs in an index flow framework. In section 2, we discuss the mathematical arguments that lead to defining CIs for quantiles which follow a generic distribution function. Subsequently, the results are used to define parametric and semiparametric quantile-based CIs for index flow AFDCs and FDCs. Therefore, these CIs are compared with CIs obtained by equation (5), Monte Carlo simulations, and bootstrap resampling. Finally, discussion and conclusions are provided to complete the study.

2. Confidence Intervals for Order Statistics and Quantiles

[6] The definition of analytical CIs for quantiles is based on some concepts that refer to nonparametric statistics. The theoretical framework is nonparametric as it applies to very wide families of distributions FW and makes no use of functional forms or parameters of such forms [Mood et al., 1974, pp. 504–505]. Let W1:n ≤ .. ≤ Wn:n be the order statistics from a sample of size n of a variable W with a generic distribution function FW. An estimate of CIs for a given quantile wP can be obtained by using two order statistics as follows [e.g, Mood et al., 1974, pp. 512–514]:

equation image

[7] Any choice of r and s (with r < s) such that π = (1 − α) gives a nonparametric CI for wP with coverage probability 1 − α. This interval can be obtained by trying different values of r and s, such that π becomes close to (1 − α). However, when the sample size is small (for instance, a common situation when working with annual maxima), the value of π can be very different from (1 − α) [Hutson, 1999]. This shortcoming can be overcome by resorting to fractional order statistics, which can be defined as WnP:n, where n′ = n + 1 [Stigler, 1977; Hutson, 1999]. Since the estimator of the Pth quantile based on the fractional order statistics is defined as WnP:n = equation imageP = equation imageW1(P), when FW is known or fitted to data, an exact 100(1 − α)% CI for wP may be obtained by reformulating equation (6) as follows:

equation image

where wl and wu are the lower and upper limits of the CI for wP, respectively, and Pl = FW (wl) and Pu = FW (wu). Equations (6) and (7) have been developed in a nonparametric context to compute interval estimates of quantiles without formulating hypotheses about FW. In this case, the limits wl and wu have to be estimated by WnPl:n and WnPu:n, which in turn cannot be computed from the sample since nP need not be an integer. This explains the very limited use of the above formulation for computing CIs for quantiles. However, this shortcoming does not arise at all when FW (and wl = FW−1(Pl), and wu = FW−1(Pu)) is known or fitted to data. Moreover, from equation (7), by replacing wl and wu with a generic w, it follows that

equation image

[8] Equation (8) provides the distribution function of quantiles WP; it may be inverted numerically, and, to the best of our knowledge, the analytical expression of its derivative with respect to w is not available. Fractional order statistics also allow generalizing the distribution of integer order statistics in equation (4) by replacing Xi:n′ with WnP:n:

equation image

[9] Equation (9) describes the distribution function of fractional order statistics WnP:n. Hutson [1999] stated that equation (9) approximates very well with equation (8) for the median quantile (P = 0.5), whereas Serinaldi [2009] explained that the approximation is rather satisfactory for quantiles with probability P ∈ [1/(n + 1), n/(n + 1)] independently of the form of FW. An illustrative example of the difference between the two distributions is shown in Figure 1, wherein it is assumed that W is a standard Gumbel variable, n = 20, and P = {1/(n + 1) ≈ 0.05, 0.25, 0.5, 0.75, n/(n + 1) ≈ 0.96}. The distributions are very close to each other for P ∈ [0.25, 0.75], and the differences become more evident when the focus is on extreme quantiles [Serinaldi, 2009]. For further details on the comparison of equation (8) and (9), the readers can refer to the above mentioned references.

Figure 1.

Examples of distribution functions of quantiles image (equation (8)) and order statistics image (equation (9)) for a standard Gumbel variable W with n = 20 and P = {1/(n + 1), 0.25, 0.5, 0.75, n/(n + 1)}.

[10] Since equations (8) and (9) were developed in a nonparametric framework, they hold in principle for every distribution function FW. Commonly, FW is a parametric distribution (e.g., Gaussian, lognormal, or log-Pearson III), whose parameters are fitted to data. However, in some cases (e.g., for samples with large size), the fit of parametric families is unsatisfactory, and nonparametric alternatives can be more suitable. The theory of fractional order statistics provides a nonparametric distribution potentially useful for practical applications. Originally, fractional order statistics were developed as a purely technical device for defining a continuum of order statistics in order to facilitate large-sample theory calculations involving linear combinations of order statistics [Hutson, 1999]. Stigler [1977] demonstrated that the distribution of the fractional order statistic WnP:n = equation imageP can be approximated by the distribution of the linear interpolation estimator of the quantile function:

equation image

where ε = nP − ⌊nP⌋, and ⌊·⌋ denotes the floor function. Equation (10) represents the quantile estimator introduced by Parzen [1979] and applied by Vogel and Fennessey [1994] for defining nonparametric AFDCs. Since its applicability is limited within the specified range of probabilities [1/(n + 1), n/(n + 1)], it was extended by Hutson [2002], introducing exponential tails to account for extreme quantiles. The inverse of Hutson′s [2002] quantile function defines a nonparametric estimate of FW as follows:

equation image

where Wj:n and Wk:n are the order statistics closest to w and equation image = (wWj:n)/(Wk:nWj:n). This distribution function allows the extrapolation to extreme probabilities and performs reasonably well for data from midtailed to light-tailed distributions. In sections 35, it is shown how equations (8), (9), and (11) are introduced in the index flow framework to define parametric and semiparametric FDCs and AFDCs and their CIs.

3. Quantile-Based Confidence Intervals for AFDCs

[11] According to the index flow approach, XP = Q · XP. Thus, the AFDCs in terms of quantiles have the same form as the AFDCs on the basis of order statistics (equation (5)), and they can be obtained by simply replacing the probability density function of the order statistics image with the probability density function image of the quantiles XP′, resulting in

equation image

where image is given by equation (8) for WP = XP′, FW = FX′ and n′ = 365 + 1. As the explicit expression of the derivative of image is not available, by applying the integration by parts, equation (12) can be rewritten as [Castellarin et al., 2004a]

equation image

where xu′ is the upper limit of ΩX. Equation (13) is more tractable than equation (12), as FQ is often chosen among easily derivable distributions (e.g., logistic or gamma). Similar to AFDCs in equation (5), the quantile-based CIs for AFDCs corresponding to a given probability α can be obtained by solving equation (13).

[12] Two examples help to point out similarities and differences between equation (5) and (13). In the first example, CIs for AFDCs from equations (5) and (13) have been compared for a daily streamflow series analyzed by Castellarin et al. [2004a]. In this case, the model (the so-called LO-GPA) is fully parametric and is based on the assumption that log Q and X′ follow a two-parameter logistic (LO) distribution and a three-parameter generalized Pareto (GPA) distribution, respectively. The values of the parameters (estimated by L moments method) as well as the details on river basins and streamflow data are provided by Castellarin et al. [2004a]. Theoretical results were compared with empirical values obtained by Monte Carlo simulations that were set up as follows. Since 40 years of daily data were recorded at this site, 40 series of size 365 were simulated from GPA distribution to mimic X′, and each series was multiplied by a value of Q drawn from the fitted LO distribution. The 40 series of X = Q · X′, arranged in ascending order, mimic 40 AFDCs, and their ensemble is a simulated 40 year period-of-record FDC (this simulation method is denoted as SIM.FDC).

[13] Empirical and analytical 5% and 95% confidence limits of AFDCs are shown in Figure 2a along with the simulated samples. As expected, given the agreement between equations (8) and (9) (i.e., between image and image the CIs for AFDCs based on order statistics and quantiles are almost equal as the dimensionless duration P ranges within the interval [1/(n + 1), n/(n + 1)]. A few differences arise for small dimensionless durations (corresponding to annual maxima), since equations (8) and (9) are influenced by the upper tail behavior of FX. Moreover, the discrepancies between the analytical and empirical CIs are related to the small number of simulated AFDCs, which was chosen equal to the observed sample size (40 years); however, these differences tend to disappear as the number of simulated AFDCs increases.

Figure 2.

(a) Simulated AFDCs and empirical and analytical CIs for AFDCs (equations (5) and (13)) for the Potenza River streamflow series studied by Castellarin et al. [2004a]. (b) Empirical AFDCs and empirical and analytical CIs of AFDCs for the Quaboag River data.

[14] The second example presents the applicability of the semiparametric model constructed using equation (11) to describe FX′. The data consist of 77 years of daily streamflow records from the Quaboag River at West Brimfield, Massachusetts (U.S. Geological Survey (USGS) station code 01176000) spanning from 1912 to 1989. The data set, retrieved from the USGS Web site (http://waterdata.usgs.gov/nwis), was already studied by Vogel and Fennessey [1994]. The LO distribution was used to model log Q, and the parameters were estimated by the L moments method. The results are shown in Figure 2b. The agreement between the empirical and semiparametric CIs for AFDCs can be questioned for dimensionless durations <0.1 and >0.9. However, the disagreement can be ascribed to some intrinsic shortcoming of the index flow approach rather than to the accuracy of the method used to compute the CIs. In this example, the LO distribution showed quite a good agreement with log Q (Kolmogorov-Smirnov, Cramer-von Mises, and Anderson-Darling goodness-of-fit tests [e.g., Laio, 2004] did not reject the null hypothesis at 1% significance level), and the nonparametric distribution in equation (11) provided an almost perfect fit for X′. Nevertheless, the index flow method assumes that the AFDCs should be almost parallel as they should collapse to a single curve (i.e., the annul pattern of X′) when they are divided by Q (which is constant for each year). The simulated samples in Figure 2a exhibit this behavior, whereas the observed AFDCs in Figure 2b show some departures, especially on the tails.

[15] In spite of the interest for the performance of FDC models for low and high durations, it is worth noting that often the behavior of FDCs and AFDCs for these durations is not properly visualized, making it difficult to perform possible comparisons. For example, FDCs and AFDCs are often plotted using the linear scale for P, resulting in plots that show strong curvatures for high and/or low durations (say, P > 0.95 and P < 0.05), so that assessing the agreement between the models and empirical counterpart is not easy [e.g., LeBoutillier and Waylen, 1993; Sugiyama et al., 2003; Castellarin et al., 2004a, 2007; Iacobellis, 2008; Niadas and Mentzelopoulos, 2008; Shao et al., 2009]. Moreover, some studies focus on limited ranges of durations [e.g., Yu et al., 2002]. A more appropriate visualization requires stretched abscissas, resorting, for instance, to lognormal probability plots [Vogel and Fennessey, 1994; Ganora et al., 2009]. Croker et al. [2003] used stretched abscissas, showing durations P ∈ [0.01, 0.99], whereas Hughes and Smakhtin [1996] and Smakhtin and Masse [2000] provide two of the few examples wherein extreme durations are properly visualized (P ∈ [0.0001, 0.9999]). Therefore, the fitting performance shown in Figure 2b is in agreement with the results reported in the literature for durations P ∈ [0.05, 0.95] [e.g., Castellarin et al., 2004a; Iacobellis, 2008]. Moreover, the most relevant point is that percentiles of the semiparametric AFDCs from equations (5) and (13) are very close to each other, except for unavoidable discrepancies arising, for instance, at P = 365/366. Thus, dealing with AFDCs, the difference between equations (5) and (13) is conceptual rather than practical; the main advantage of CIs for quantiles is their wider applicability, as is shown in section 4.

4. Quantile-Based Confidence Intervals for X = Q · X

[16] Equation (13) describes the propagation of the distribution (and related uncertainty) of annual dimensionless streamflows X′ taking into account the impact of the index flow Q. Unlike AFDCs, FDCs (equation (2)) focus on a period-of-record sample, which can be considered as the ensemble of N annual subsamples. Hence, the uncertainty of FDCs can be quantified by the CIs for quantiles of the random variable X = Q · X′ which follows the distribution FX in equation (2). Setting up W = X and FW = FX in equation (8), the distribution function of XP may be written as

equation image

where n′ = 365N + 1. Similar to AFDCs, CIs for FDCs corresponding to a given probability α can be obtained by solving equation (14).

[17] To better understand the meaning of equation (14), we have simulated 100 synthetic series by two different approaches based on LO-GPA distribution with parameters corresponding to the Potenza River example. The first method is the SIM.FDC, wherein each series of size 365 × 40 was simulated multiplying 40 blocks of 365 simulations from GPA distribution (mimicking 40 years of X′) by 40 values of Q drawn from LO distribution. This approach is coherent with the rationale of the FDCs; however, it does not reflect the behavior of the LO-GPA population described by equation (2), as is shown later. The second method implies that LO-GPA samples of size 365 × 40 are simulated by generating standard uniform samples and inverting equation (2) or, alternatively, by multiplying 365 × 40 values from GPA distribution by 365 × 40 different values drawn from LO distribution (hereinafter, this method is denoted as SIM.F). In Figure 3a, 100 SIM.F series of size 365 × 40 (arranged in descending order) are superimposed to 100 SIM.FDC sequences. SIM.FDC series are characterized by a variability higher than SIM.F and different lower tail behavior. To analyze these differences, we have computed the empirical 90% CIs from the simulated FDCs. In more detail, these CIs were built by considering the 5th and 95th percentiles of the 100 realizations corresponding to each available value of P ∈ [1/(365 × 40 + 1),(365 × 40)/(365 × 40 + 1)]. This method is equal to that used for computing CIs of AFDCs and provides CIs for order statistics, which are very close to CIs for quantiles [Serinaldi, 2009].

Figure 3.

(a) Hundred ordered LO-GPA sequences of size 365 × 40 simulated by SIM.F and SIM.FDC methods (see text for further details). (b and c) Comparison of analytical 90% CIs and Monte Carlo CIs computed on SIM.F and SIM.FDC simulations. The parameters of the LO-GPA model refer to the Potenza River example.

[18] Therefore, the empirical 90% CIs for SIM.F and SIM.FDC are compared with the analytical CIs provided by equation (14). Figures 3b and 3c point out that equation (14) describes the CIs for LO-GPA quantiles corresponding to SIM.F (and equation (2)) rather than the uncertainty of the FDCs generated by the algorithm SIM.FDC. It should be noted that the stepwise behavior of analytical CIs for very low and high durations in Figure 3b is related to numerical approximations which affect the computation for very small and high probabilities (P < 0.001 and P > 0.999). The differences between empirical and analytical CIs highlight that equation (2) is the distribution function of the product of two independent random variables (Q and X′), so that Xi = Qi · Xi, with i = 1,..,n, whereas an index flow FDC describes the product of n realizations of X′ by N = n/365 realizations of Q. The discrepancy between the sample size of Q and X′ results in blocks of observations of X = Q · X′ (with size 365) wherein Q is unique and the observations of X cannot be considered independent. It should be noted that this lack of independence is not related to the serial correlation of the streamflow series but to the FDC structure. Thus, equation (14) yields exact CIs for quantiles that follow the distribution in equation (2), which in turn is not properly a period-of-record FDC.

5. Approximate Quantile-Based Confidence Intervals for FDCs

[19] Provided that equation (14) does not yield CIs for a FDC, the discussion in section 4 allows the derivation of approximate analytical CIs. As FDCs involve the product of n realizations of X′ by n/365 realizations of Q, resulting in blocks of dependent observations of X = Q · X′, it is reasonable to account for this lack of independence by introducing an effective sample size neff to be used in equation (14). The work hypothesis is that n values of X contain less information than the corresponding n values of X′ but more information than the N = n/365 realizations of Q involved in an FDC. Since X = Q · X′, the effective size of X should also be the effective size of Q and X′. These assumptions allow the introduction of a factor ϕ that reduces n and amplifies N, so that

equation image

[20] Figure 4 compares the Monte Carlo 90% CIs and the CIs computed by equation (14) for the Potenza River example, introducing neff, with N equal to 10, 20, and 40 years. The reduction factor ϕ provides an appropriate correction for all values of N and durations P ∈ [0.001, 0.999], meaning that ϕ can help to overcome, at least partially, the problems related to the different sample size of Q and X′ involved in the derivation of FDCs. However, it should be recalled that equation (14) does not give exact CIs as it relies on equation (2), which in turn is not properly a period-of-record FDC, but the distribution of Xi = Qi · Xi, with i = 1,..,n.

Figure 4.

Comparison of Monte Carlo 90% CIs computed on SIM.FDC simulated series and analytical CIs from equation (14) with neff = N/ϕ for N = 10, 20, 40 years. The CIs refer to LO-GPA model for the Potenza River data.

[21] Analogous to AFDCs, the FDC and corresponding CIs were also computed for the Quaboag River data. In this case, data availability allows the definition of bootstrap CIs based on the following procedure: given 77 years of daily observations, we have defined 77 blocks of size 365 (each one corresponding to 1 year) for X′ and the 77 values of the mean annual discharge Q; under the hypothesis of independence of X′ and Q, the blocks of X′ and the values of Q were sampled with replacement 77 times, and each block of X′ was multiplied by a value of the resampled Q to obtain a new 77 year bootstrap FDC; the previous step was repeated B = 100 times, resulting in 100 bootstrap FDCs; finally, empirical 75% and 90% CIs were computed as is described in section 4 for Monte Carlo simulations. Analytical 75% CIs were computed using both n = 365 × 77 and neff = n · ϕ and the semiparametric model. The results are shown in Figure 5. All the approaches yield similar results for P ∈ [0.01, 0.99]; however, this is the less interesting range of durations as the uncertainty is very small owing to the large sample size associated with the FDC. For extreme durations, the bootstrap CIs can be assumed as the benchmark CIs, since they do not involve any model hypothesis. The bootstrap CIs enclose the empirical FDC for high durations (small discharges), whereas they tend to underestimate high discharges (for small durations). As the bootstrap FDCs are obtained by multiplying bootstrap series and values of X′ and Q, particular combinations of bootstrap X′ and Q can yield values smaller or higher than those observed, allowing the extrapolation of extreme values. However, in the present example, we obtain low flows smaller than the observed discharges, but the method is not able to yield high flows greater than the observed ones, as there are no combinations of bootstrap X′ and Q that yield high discharges more extreme than the observed combination. This combination is probably the most extreme possible combination for the data on hand. Therefore, even though the above mentioned bootstrap method can allow the extrapolation, this depends on the data on hand. For Quaboag River data, the bootstrap approach is not able to extrapolate high flows, resulting in bootstrap CIs that do not enclose the empirical FDC for the smallest durations.

Figure 5.

The bootstrap 75% and 90% CIs (light grey and dark grey areas, respectively) obtained by resampling the Quaboag River FDC (black line). Bootstrap CIs are compared with analytical 75% CIs computed by equation (14) with n = 365 × 77 (grey bars) and neff = equation image × 77 (black bars).

[22] For P < 0.01 and P > 0.99, the three methods (bootstrap, analytical with n, and neff) give different results. The differences between analytical CIs computed with n and neff are coherent with those shown in Figure 2 for the LO-GPA model. The bootstrap CIs must be compared with the analytical CIs on the basis of the effective sample size neff. The discrepancies between these CIs are evident for P > 0.99 and P < 2 × 10−4 and can be ascribed to the parametric part (LO distribution of log Q) of the semiparametric model. The choice of a parametric component in the index flow modeling has a strong impact on the resulting FDC, especially on the tails, and affects analytical CIs, as they rely on the index flow model in equation (2): when this model is correctly specified (as shown in Figures 2 and 3), then the corresponding analytical CIs are also reliable; otherwise, strong discrepancies can appear.

[23] In assessing the performance of the bootstrap and analytical CIs for FDCs, it is worth keeping in mind two aspects. First, both methods suffer some shortcomings: commonly, the bootstrap approach does not allow the extrapolation, while the analytical method depends on the performance of the adopted parametric or semiparametric model. Second, dealing with FDCs, the focus is on really extreme durations, as the uncertainty corresponding to middle durations is negligible owing to the large sample size (n = 365N). Therefore, the comparison of bootstrap and analytical CIs can provide, to some extent, a cross check of the results, but it does not guarantee definitely the reliability of the computed CIs for FDCs.

6. Discussion and Conclusions

[24] The introduction of analytical CIs for quantiles provides a coherent and general approach to further study the relationships between index flow FDCs and AFDCs, overcoming some shortcomings of the distribution functions of order statistics. In particular, this approach helps to point out that the analytical formulation of index flow FDCs describes properly the product of two continuous random variable (here, Q and X′), but it is not able to account for the effects of the different sample size of Q and X′ involved in the definition of FDCs. Nevertheless, reasoning in terms of CIs for quantiles, we can introduce approximate analytical CIs that agree with Monte Carlo CIs rather well, when the model is correctly specified. Recalling that the index flow approach involves few parameters, as it is designed to provide a parsimonious description of FDCs and AFDCs, the possible misspecification of the behavior of the tails is common in real-world applications. In these cases, neither the nonparametric bootstrap nor the analytical method could give accurate CIs; however, their comparison allows for a useful, mutual check. By focusing on the AFDCs, the main cause of the discrepancies between empirical CIs and analytical CIs can be ascribed to the departures from the hypothesis that the annual sequences of X′ are identically distributed (see section 3 and Figure 2b). When this assumption is fulfilled, analytical CIs are very close to the empirical CIs (Figure 2a), proving the overall correctness of the quantile-based method.

[25] Using equation (11) to describe the distribution of X′, equations (2), (13), and (14) provide semiparametric models that allow exploiting nonparametric regionalization procedures. For example, Ganora et al. [2009] derive dimensionless FDCs for observations of X′ = X/Q that refer to a number of stream gauges located in a given area; therefore, homogeneous groups of FDCs patterns are selected from the whole set of dimensionless FDCs, according to a clustering approach that defines homogeneous regions; finally, the regional curves are derived as the average of all curves belonging to each cluster or region. As these regional dimensionless FDCs may be difficult to model owing to possible irregular behavior (especially on the tails), equation (11) is suggested as a possible nonparametric alternative to parametric families, reducing the overall number of the parameters of the index flow model. Coupling equation (11) for X′ with a suitable parametric distribution for Q through equation (2) results in a complete regional semiparametric model for X. However, a more detailed discussion of this approach goes beyond the scope of the present study and will be the subject of future communications.

[26] Finally, it is worth noting that the introduction of quantile-based CIs and semiparametric models does not overcome the possible shortcomings of the index flow method. Nevertheless, the concepts introduced in this study help to investigate index flow FDCs from an alternative point of view, shedding a new light on some of their properties and indicating possible directions for further developments.

Acknowledgments

[27] The author thanks Attilio Castellarin (Università di Bologna, Italy), John F. England Jr. (Bureau of Reclamation, United States), an anonymous reviewer, and the Associate Editor for their insightful comments that helped to greatly improve the quality of the original manuscript.

Ancillary