Index flood–based multivariate regional frequency analysis

Authors


Abstract

[1] Because of their multivariate nature, several hydrological phenomena can be described by more than one correlated characteristic. These characteristics are generally not independent and should be jointly considered. Consequently, univariate regional frequency analysis (FA) cannot provide complete assessment of true probabilities of occurrence. The objective of the present paper is to propose a procedure for regional flood FA in a multivariate framework. In the present paper, the focus is on the estimation step of regional FA. The proposed procedure represents a multivariate version of the index flood model and is based on copulas and a multivariate quantile version with a focus on the bivariate case. The model offers increased flexibility to designers by leading to several scenarios associated with the same risk. The univariate quantiles represent special cases corresponding to the extreme scenarios. A simulation study is carried out to evaluate the performance of the model in a bivariate framework. Simulation results show that bivariate FA provides the univariate quantiles with equivalent accuracy. Similarity is observed between results of the bivariate model and those of the univariate one in terms of the behavior of the corresponding performance criteria. The procedure performs better when the regional homogeneity is high. Furthermore, the impacts of small variations in the record length at gauged sites and the region size on the performance of the proposed procedure are not significant.

1. Introduction and Literature Review

[2] Extreme events, such as floods, storms and droughts, have serious economic, environmental and social consequences. It is hence of high importance to develop the appropriate models for the prediction of such events both at gauged and ungauged sites. Local and regional frequency analysis (FA) procedures are commonly used tools for the analysis of extreme hydrological events. The objective of regional frequency analysis (RFA) is to transfer information from gauged sites to an ungauged target site within a homogeneous region.

[3] Generally, hydrological events are characterized by several correlated variables. For instance, floods are described through their volume, peak and duration [Ashkar, 1980; Yue et al., 1999; Ouarda et al., 2000; Yue, 2001; Shiau, 2003; De Michele et al., 2005; Zhang and Singh, 2006; Chebana and Ouarda, 2009]. These studies have pointed out the importance of jointly considering all these variables. Depending on data sources and the number of variables that characterize the event, frequency analysis can be divided into four classes: univariate-local, univariate-regional, multivariate-local and multivariate-regional. The first two classes have been extensively studied [see, e.g., Stedinger and Tasker, 1986; Burn, 1990; Hosking and Wallis, 1993; Durrans and Tomic, 1996; Nguyen and Pandey, 1996; Alila, 1999, 2000; Ouarda et al., 2001, 2006; Chebana and Ouarda, 2008]. Recently, increasing attention has been given to multivariate-local FA by, e.g., Yue et al. [1999], Yue [2001], Shiau [2003], De Michele et al. [2005], Zhang and Singh [2006], and Chebana and Ouarda [2009]. However, much less attention is given to multivariate-regional FA. In this category, we find few references such as Ouarda et al. [2000] and Chebana and Ouarda [2007].

[4] Justifications for adopting the multivariate framework to treat extreme events were discussed in several references. In bivariate FA, Yue et al. [1999] concluded that single-variable hydrological FA can only provide limited assessment of extreme events. A better understanding of the probabilistic characteristics of such events requires the study of their joint distribution. It was also outlined by Shiau [2003] that multivariate FA requires considerably more data and more sophisticated mathematical analysis. Univariate FA can be useful when only one random variable is significant for design purposes or when the two random variables are less dependent. However, a separate analysis of random variables cannot reveal the significant relationship between them if the correlation is an important information in the design criteria. Therefore, it is of importance to jointly consider all the random variables that characterize the hydrological event.

[5] Three main elements are treated in multivariate-local FA literature: (1) explaining the usefulness and importance of considering the multivariate framework, (2) modeling extreme events by fitting the appropriate copula and marginal distributions, and estimating the corresponding parameters, and (3) defining bivariate return periods. However, despite the importance of quantiles in FA, the literature on multivariate-local FA did not specifically address the estimation of multivariate quantiles. Recently, Chebana and Ouarda [2009] introduced the notion of multivariate quantile in hydrological FA.

[6] Regional FA is generally composed by two main steps: regional delineation and extreme quantile estimation [see, e.g., Groupe de recherche en hydrologie statistique (GREHYS), 1996a]. In the multivariate context, the delineation step was treated by Chebana and Ouarda [2007] where multivariate discordancy and homogeneity statistical tests were proposed. In univariate-regional FA different quantile estimation methods were proposed in the literature, such as the index flood method and regressive models [see GREHYS, 1996a, 1996b]. As a natural continuation of the study by Chebana and Ouarda [2007] and in order to present a complete multivariate-regional FA framework, an estimation procedure in the multivariate context is presented in this paper. The present procedure is an extension of the index flood model to the multivariate context.

[7] The multivariate index flood model is based on two main concepts: multivariate quantile curves and the notion of copulas. The univariate index flood model aims to obtain an estimation of quantiles at ungauged sites using data (and hence quantiles) from sites within a specified region. The objective of FA is the quantile estimation which can be obtained through the cumulative distribution function or the density function. The multivariate quantile version adopted in this paper is a curve composed of combinations of the variables corresponding to the same risk. Copulas are employed in order to model the dependence between the variables describing the event.

[8] The paper is organized as follows. In section 2, we present some background elements required for the development of the methodology: the index flood model and multivariate quantile curves. In section 3, we present the multivariate index flood model. In section 4 a simulation study is carried out to evaluate the performance of the proposed model with an adaptation of the procedure to flood events. Results and discussions are reported in section 5, and the conclusions are presented in section 6.

2. Background

[9] The principal elements required for the development of the proposed estimation procedure are presented in this section; namely, the univariate index flood model and bivariate quantile curves.

2.1. Univariate Index Flood Model

[10] The index flood model was first introduced by Dalrymple [1960]. Similar models can also be used for other hydrological variables including droughts and storms [Pilon, 1990; Hosking and Wallis, 1997; Hamza et al., 2001]. In this model the region is assumed to be homogeneous. That is, all sites in the region have the same frequency distribution apart from a scale parameter that characterizes each site. Explicitly, for a region where data are available for N sites, the model gives the quantile Qi(p) corresponding to the nonexceedance probability p at site i as

equation image

where μi represents the index flood and q(.) is the regional growth curve.

[11] The index flood parameter μi may be estimated, for instance, as the sample mean at site i. The growth curve q(.) may be estimated using the standardized data of the whole region. Usually, we assume known the form of q(.) through a regional distribution K(.; equation image) except for some parameters equation image = (θ1,…, θs). More details about the index flood model can be found in the work by Hosking and Wallis [1993] and for a more recent review the reader is referred to Bocchiola et al. [2003].

2.2. Multivariate Quantiles

[12] In the literature, several studies proposed to extend the well-known univariate quantile to higher dimensions. Serfling [2002] presented a review and a classification of some of these multivariate quantile versions. According to this classification, there are two major categories of multivariate quantiles: vector- and real-valued quantiles. In the vector-valued class, we find multivariate quantiles based on depth functions [Serfling, 2002], multivariate quantiles based on norm minimization defined by Abdous and Theodorecu [1992] and Chaudhuri [1996], multivariate quantiles as inversions of mappings studied by V. Koltchinskii and R. M. Dudley (On spatial quantiles, unpublished manuscript, 1996); and data-based multivariate quantiles based on gradients developed by Hettmansperger et al. [1992]. The real-valued quantile class contains the generalized quantile processes introduced by Einmahl and Mason [1992].

[13] More recently, Belzunce et al. [2007] defined another bivariate vector-valued quantile version. This version is not included in the review by Serfling [2002] and is focused on the bivariate context. Let (X, Y) be an absolutely continuous random vector and p ∈ [0, 1]. The pth bivariate quantile set or bivariate quantile curve for the direction ɛ is defined as

equation image

where Fɛ(x, y) is one of the four following probabilities:

equation image

which represent the probabilities of the events in the four quadrants of the plane.

[14] In other words, the bivariate quantile (2) is a curve corresponding to any combination (x, y) that satisfies Fɛ(x, y) = p (an infinity of combinations). This definition of the bivariate quantile is simple, intuitive and does not require any symmetry assumption. Furthermore, the bivariate distribution (copula and margins) appears in its evaluation. A multivariate quantile curve can be obtained for the uniform margins and then transformed using the univariate quantile function of each component. To this end, we introduce copulas as follows. A copula is a description of the dependence structure between two or more random variables. For more details on copula functions, the reader is referred, for instance, to Nelsen [2006] or to Chebana and Ouarda [2007]. Sklar's [1959] theorem is an important result which provides, for two random variables X and Y, the relationship between their bivariate joint distribution F, the corresponding copula C and marginal distributions FX and FY. Sklar's result states that there exists a copula C such that

equation image

In addition, if FX and FY are continuous, the copula C is unique. Hence, for the event{Xx, Yy}, using (2) and (3), the quantile curve can be expressed as follows:

equation image

Consequently, in the present paper the proposed bivariate index flood model is based on (4). The resolution of equation (4), using copula and margin expressions, leads to several solutions called combinations. These combinations constitute the corresponding quantile curve.

[15] The usual univariate quantiles are special cases of the bivariate quantile curve given in (2) or (4). Indeed, Figure 1 illustrates that the univariate quantiles represent the extreme points of the proper part of the bivariate quantile curve.

Figure 1.

Illustration of the bivariate and univariate quantiles corresponding to the nonexceedance probability p.

[16] The following notations are employed throughout the paper and are illustrated in Figure 1:

[17] QCp is the bivariate quantile curve associated with a risk p of the nonexceeding event on variables X and Y which corresponds to QCX,Y(p, ɛ − −) of equation (2) (we may denote it by QCX,Y(p) if it is necessary to make the emphasis on the variables); QCx,y(p) represents a point (a combination) of the curve QCp; QCx(p) and QCy(p) are the coordinates of the point Qx,y(p), that is, Qx,y(p) = (QCx(p), QCy(p)).

[18] The univariate quantiles are denoted as QDX(p) and QDY(p) when directly evaluated and QLX(p) and QLY(p) when deduced as extreme values from the bivariate quantile curve. A complete list of the notations used in the paper is presented at the end of the document.

[19] We close this section by summarizing some key facts that are related to the notion of bivariate quantile in local FA (for more details see Chebana and Ouarda [2009]).

[20] 1. The quantile curves, for practical reasons, are composed of two parts: naïve part and proper part (central part). The naïve part is composed of two segments starting at the end of each extremity of the proper part. These segments are parallel to the axis. The points that define the extremities correspond to the maximum value for each of the variables x and y in the empirical version of the quantile curve or in the case where the marginal distributions are bounded (right or left according to the considered event). In the case of quantiles corresponding to the parent distribution when the margins are not bounded, there are two options to identify the extremities of the proper part. It is possible to take the extremities to be related to those of the empirical version. It is also possible to select the extremities to be as close as needed to the asymptotes. The first option is useful for the comparison of the empirical and true quantiles. For simplicity, Figure 1 presents the case where the margins are right bounded.

[21] 2. The marginal quantiles correspond to the extreme scenarios of the proper part related to the event.

[22] 3. For a given sample, univariate estimation results should be used cautiously since the combination of the univariate quantile values of each variable does not correspond to the desired risk and hence may lead to wrong conclusions.

[23] 4. Some events, relating both variables, cannot be expressed in the univariate context.

[24] 5. The number of bivariate quantile scenarios in the proper part decreases, when the risk p increases, and hence the proper part of the quantile curve becomes shorter.

[25] These key statements are illustrated in Figure 1. In the remainder of the paper, if it is not specified, the quantile curve refers to the proper part of the full curve. Generally, explicit or analytical expressions of bivariate quantiles are not available. Hence, bivariate quantiles are obtained numerically by resolving equations (2) or (4). Some difficulties may arise when solving these equations, especially for values of p that are very close to 1 and/or for complex distributions (margins and copula). The procedure employed to obtain multivariate quantile curves is parametric. That is the joint distribution F is shown to belong to a class of parametric distributions with unknown parameters to be estimated. The class of parametric distributions is identified using goodness-of-fit tests. The parametric estimation approach is commonly used in hydrologic FA. In the univariate context some nonparametric approaches have been employed in hydrologic FA [see, e.g., Adamowski and Feluch, 1990; Ouarda et al., 2001] but these methods are of limited use for hydraulic design of major structures as indicated by Singh and Strupczewski [2002].

3. Multivariate Index Flood Model

[26] The following procedure represents a complete multivariate version of regional FA. It includes the two main steps: delineation of a homogeneous region and estimation of the extreme event. The step dealing with the delineation of a region is treated by Chebana and Ouarda [2007] who proposed multivariate discordancy D and homogeneity H statistical tests based on multivariate L moments. The statistic H is also employed as a heterogeneity measure for a given region. The development of the estimation step is the object of the present paper. It consists in extending the index flood model to the multivariate framework. For notation clarity and computation simplicity, the procedure is presented for the bivariate setting. Nevertheless, the procedure can be conceptually extended to higher dimensions. However, some theoretical and practical elements need to be developed such as copula modeling, parameter estimation and computational aspects. These elements are discussed at the end of the present section.

[27] Given a set of N sites with record length ni at site i, i = 1,…, N, the problem is to estimate, at the target site ℓ, the quantile of interest corresponding to a given risk p, 0 < p < 1 (or equivalently a return period T). The data are of the form (xij, yij) for j = 1,…, ni and i = 1,…, N where x and y represent realizations of the considered variables. Let qCp be the regional growth curve which represents a quantile curve common to every site in the region. It can be seen as a “regional quantile curve” and can be obtained on the basis of the standardized data of the whole region.

[28] The procedure is described as follows.

[29] 1. Identify the set of sites (region) to be used in the estimation as follows [Chebana and Ouarda, 2007]: apply the multivariate discordancy test D to identify discordant sites to be removed from the region and then check the homogeneity of the remaining sites by applying the multivariate homogeneity test H. Assume these sites are indexed from 1 to N′ (with N′ ≤ N).

[30] 2. For each site i, i = 1,…, N′, assess the location parameters equation imagei,X and equation imagei,Y and standardize the sample (xij, yij) to be (xij = xij/equation imagei,X, yij = yij/equation imagei,Y).

[31] 3. Select a family of regional multivariate distributions to fit the standardized data of the whole region(xij, yij) for j = 1,…, ni and i = 1,…, N′: this includes the marginal distributions as well as a copula. In the present context, assume that the regional distribution depends upon s parameters denoted θ1,…, θs.

[32] 4. Estimate the parameters of the distribution obtained in step 3. Obtain an estimator equation imagek(i) of the kth parameter from the standardized data of the ith site, k = 1,…, s and i = 1,…, N′. The maximum likelihood method or the L moments–based method can be used for the estimation. Obtain the weighted regional parameter estimators:

equation image

[33] 5. Evaluate, for a given value of p, different combinations of the estimated growth curve equation imagex,y(p) from (4) using the fitted multivariate distribution with the corresponding weighted regional parameters equation imagek(i) with k = 1,…, s.

[34] 6. Multiply componentwise each growth curve combination by the location parameter of the target site ℓ, equation imageℓ,X and equation imageℓ,Y:

equation image

Hence the obtained result in (6) is an estimate of the local quantile curve corresponding to the target site.

[35] Note that equation imageℓ,X and equation imageℓ,Y, representing the indices of the target site ℓ, are generally assumed to be location parameters. Particular values of these indices can be the sample median or the sample mean. Furthermore, they can be obtained from meteorological and physiographical features of an ungauged site, for instance, through a linear model. Since the classical index flood model is based on the nonexceedance probability FX(x) = P(Xx), we have considered, in this procedure, the probability of the event {Xx, Yy}. However, if another event is of interest, appropriate changes can easily be brought to the proposed model.

[36] To deal with step 3 in the described procedure, goodness-of-fit tests are required for copula as well as for marginal distributions. Such tests are well known in the literature for univariate distributions. For instance, the empirical cumulative distribution function given by Cunnane [1978] can be used. Recently, some statistical tests (numerical or graphical) have been developed to test copula's goodness of fit [see, e.g., Fermanian, 2005; Genest et al., 2009].

[37] We end this section by stating the required elements to define the above procedure in a multivariate setting. Let (X1,…, Xd) be a random vector defined on Rd, d ≥ 1, with joint distribution F and marginal distributions image image On the basis of Sklar's theorem, there exists a copula C such that F(x1,…, xd) = C image image for real x1,…, xd.

[38] Assume that we are interested in the event {X1x1,…, Xdxd}. Then the corresponding multivariate quantile is given by

equation image

For d = 3 the multivariate quantile represents a surface in a three-dimensional space. For the target site ℓ, equation (6) of the index flood model becomes

equation image

Therefore, all the theoretical elements required to define the procedure in a d-dimensional space are available. However, in practice, some difficulties arise. A key point is related to the effective modeling of the multivariate copula. Even though some well-known classes of Archimedean copulas and extreme value copulas are available in the multivariate setting, they are not convenient to model when the dependence structure is complex. Fitting other kinds of copulas is a topic of continuous development. The number of parameters to be estimated (related to each marginal distribution and to the copula) grows quickly with the dimension d. The numerical difficulties encountered in the bivariate setting become even more important.

4. Performance Evaluation Using Simulation

[39] In order to evaluate the performance of the proposed model, a simulation study is carried out. Before starting the simulation procedure, it is required to define the regions to be simulated and convenient evaluation criteria. Note that these evaluation criteria are also employed by Chebana and Ouarda [2009] and are adapted in the present work to the regional context. Recall that the variables X and Y are selected to be the flood volume and flood peak, respectively (Figure 2).

Figure 2.

Typical flood hydrograph.

4.1. Simulated Regions

[40] As it was already underlined, to apply the index flood model, the region should be homogeneous. However, to be more realistic, the region can also be “possibly homogeneous” rather than “exactly homogeneous” [see Hosking and Wallis, 1997]. In that case, the value of the corresponding heterogeneity statistical measure H could be between 1 and 2. Since, the present study is a continuation of the work by Chebana and Ouarda [2007], the same regional distribution can be considered, namely, a bivariate distribution with Gumbel margins and Gumbel logistic copula given respectively by

equation image
equation image

By replacing each x by y in (7), we obtain the expression of the marginal distribution of the variable Y. The case where γ = 1 in (8) corresponds to complete independence of the two variables.

[41] The corresponding parameters of the bivariate distribution when the region is homogeneous are

equation image

This value of γ is equivalent to the correlation coefficient ρ = 0.5 and the Kendall's tau coefficient τ = 0.3 where τ = 4E[F(X, Y)] − 1. Indeed, the parameter γ is related toρ and τ, respectively, according to the following expressions [see Gumbel and Mustafi, 1967; Genest and Rivest, 1993]:

equation image
equation image

The parameter values in (9) are those of a real data treated by Yue and Rasmussen [2002] and concern the Skootamatta River in Ontario, Canada.

[42] Three kinds of regions are considered as follows.

[43] 1. The first region is homogeneous (Homog): all sites have the same distribution with the same parameters given above.

[44] 2. The second region is a 30% completely heterogeneous region (HetCo30): the scale and dependence parameters (αX, αY and γ) increase linearly from the first to the last site in the 30% range centered around the homogeneous region parameters; for example, for a given value of αX the variation is in the range [αX(1 − 0.3/2), αX(1 + 0.3/2)].

[45] 3. The third region is a 50% heterogeneous region on the marginal parameters (HetMa50): it is the same as the above region but the dependence parameter γ is fixed and the variation is on the marginal parameters αX and αY.

[46] The representative simulated regions are composed of a number of sites N = 15 and each site contains n = ni = 30 observations where γ = 1.414 (in the remainder of the paper other values of γ are also considered). For the regions HetCo30 and HetMa50, the corresponding mean values of the heterogeneity statistical measure H are 1.30 and 1.36, respectively. Hence they can be considered effectively as “possibly homogeneous” regions. These values are obtained following the procedure defined by Chebana and Ouarda [2007]. Note that the location parameters βX and βY are considered to be fixed in the generated regions at the values given in (9). These parameters have no effect on the heterogeneity measure H since we are interested in the variability aspect [see also Hosking and Wallis, 1993; Chebana and Ouarda, 2007].

[47] According to the linear variation of the parameters in the regions HetCo30 and HetMa50, the corresponding sites are ranked from 1 to N. That is, for instance, the smallest parameter values and the largest ones are associated with the first site and the last site, respectively, whereas the parameters in (9) correspond to the middle site.

[48] For comparison purposes, and to study the effect of various factors on the estimation results, other regions than the representative ones are generated: (1) regions Homog and HetMa50 in the independence case (γ = 1 in the Gumbel logistic copula (8) with ni = n = 30 and N = 15 for each region; (2) regions Homog, HetCo30 and HetMa50 in the dependence case (γ = 1.414) where n = 30 and 60 with N = 15 for each region; (3) regions Homog, HetCo30 and HetMa50 in the dependence case (γ = 1.414) where N = 10, 15, 20, 50 and 100 with n = 30 for each region; and (4) regions HetCo60 and HetCo80 (similar to HetCo30 with 60% and 80% instead of 30%) in the dependence case where γ = 3.162 (equivalent to ρ = 0.9) with n = 30 and N = 15 for each region.

[49] The values of n and N are selected on the basis of situations commonly encountered in RFA [e.g., Hosking and Wallis, 1997; Chebana and Ouarda, 2007]. Note that the region HetCo30 cannot be considered in the independent case 1 where the parameter γ is fixed at γ = 1. The regions in case 4 are heterogeneous with heterogeneity mean measures H = 2.48 and 5.29, respectively. They are considered to show the effect of the heterogeneity of the region on the estimation performances. The value of γ = 3.162 is considered to ensure that γ ≥ 1 for each site in the region. There are some intersections between the above considered cases, for instance, both cases 2 and 3 contain the region HetMa50 with n = 30, N = 15 and γ = 1.414. These repetitions are kept for the coherence of the result presentation.

[50] Ghoudi et al. [1998] developed an algorithm for the generation of samples of a bivariate variable (X, Y) according to the extreme value copula. This algorithm is used in the present case since the Gumbel logistic copula (8) is also an extreme value copula. The algorithm is summarized by Chebana and Ouarda [2007].

4.2. Performance Evaluation Criteria

[51] In the present multivariate context, the bivariate quantile is a curve. Hence, the estimation result is a curve instead of a real value as in the univariate framework. Consequently, the usual performance evaluation criteria are not adapted and should be defined differently. To evaluate the performance of the method, given the true quantile curve, an estimation of the corresponding quantile curve is obtained. Then, the evaluation consists in the assessment of the distance between the true and estimated curves.

[52] In the present context the quantile curve is a function. Consequently, we can adopt the notations: (x, g(x)) for the regional growth curve and (x, G(x)) for the at-site quantile curve. These notations will ensure the clarity of the definition of the evaluation criteria. Let M be the number of simulation repetitions, and let gp[m](x) and Gp[m],i(x) be the ordinates of the mth repetition of the estimated regional growth curve and site i quantile estimate for nonexceedance probability p (0 < p < 1), respectively. Then, the corresponding coordinatewise relative errors are given by

equation image

for fixed x component along the proper part of the curve where gpi(x) and Gpi(x) are the true at-site growth curve and quantile curve ordinates, respectively. Note that the differences in the numerator represent vertical distances between points of the underlying curves. As indicated in section 2.2, the points that define the extremities of the proper part correspond to the maximum value for each of the variables x and y in the estimated quantile curve. Since the Gumbel distribution is not right bounded, in order to obtain the values in (12), we select the extremities of the true curve according to those of the estimated curve.

[53] In the coordinatewise relative errors (12), there are three index dimensions: the index i is related to sites, the index m is related to the simulation replications and the last index x is related to the quantile combinations. Therefore, it is necessary to summarize the relative errors (12) to be interpretable. To avoid repetition, we focus on the relative errors related to the quantile and those of the growth curve can be obtained in a similar manner.

[54] To summarize errors (12) with respect to x, we consider distances or norms in functional spaces. In such spaces, some possible criteria are known as the Lr distances with r ≥ 1. They are defined between functions f1 and f2 on a given space S with a positive measure λ as [see, e.g., Jones, 1993, chapter 10]

equation image

The particular cases L1, L2 and L are the most commonly used. Note that the L1 distance is more intuitive and more representative than L2 and L, but is more complex to handle in theoretical proofs because of the presence of the absolute value. Furthermore, when using the L1, the bias cannot be evaluated since, as a metric, it is always positive. For this reason, and to keep the same commonly employed performance criteria in frequency analysis, we proceed as follows. Let Lpi be the length of the proper part of the true quantile curve QCpi, then the relative integrated error is

equation image

Note that the integral RIE*i[m](p) is similar to L1 but it is not a distance in the formal sense, since it may have negative values. To be differentiated from L1, the “pseudodistance” associated with RIE*i[m](p) is denoted by L1*. RIE*i[m](p) allows assessment of the regional bias. However, it is not appropriate for the variance evaluation since it may have some null values whereas the estimation is poor. For this reason, the root-mean-square errors are evaluated on the basis of the L1 distance. In the present context, the corresponding L1 distance is given by

equation image

In order to evaluate the estimation error for a site i, on the basis of RIE*i[m](p) and RIEi[m](p), the bias and root-mean-square errors are given respectively by

equation image

To summarize these criteria over the sites of the region, it is possible to average them to obtain the regional bias, the absolute regional bias and the regional quadratic error given respectively by

equation image

The role of each one of these criteria is explained, for instance, by Hosking and Wallis [1997], in the univariate setting. The RBR measures the tendency of quantile estimates to be uniformly too high or too low across the whole region; the ARBR measures the tendency of quantile estimates to be consistently high at some sites and low at others; and the RRMSER measures the overall deviation of estimated quantiles from true quantiles.

4.3. Simulation Procedure

[55] Once the regions and the evaluation criteria are identified, the simulation procedure can be defined. It is mainly based on the general procedure given in section 3. In the simulation procedure, there is no need to apply the discordancy test. The distribution is known a priori. Hence step 3 in section 3 is omitted. The repetition and evaluation steps are only for the simulation procedure and do not concern the general procedure. The simulation procedure consists of the following steps.

[56] 1. Generate a region as described in section 4.2 with data denoted as (xij, yij), j = 1,…, ni and i = 1,…, N.

[57] 2. For each site i, i = 1,…, N, apply the following steps. 2.1. Evaluate the sample mean on both variables equation imagei,X and equation imagei,Y. 2.2. Standardize the sample to obtain (xij = xij/equation imagei,X, yij = yij/equation imagei,Y), j = 1,…, ni. 2.3. Estimate the parameters of the standardized sample related to the bivariate distribution of the generated region. For the marginal Gumbel distribution, the estimators equation imageX(i), equation imageY(i), equation imageX(i) and equation imageY(i) are obtained using the L moment method given by Hosking and Wallis [1997]. The parameter γ of copula is estimated by equation image(i) using expression (11).

[58] 3. Obtain the regional parameters using the weighted mean given by (5).

[59] 4. Obtain the combinations of the bivariate regional growth curve qCp using (7) and (8) replaced in (4), on the basis of the standardized data, for a fixed value of the risk p (here we take p = 0.9, 0.99 and 0.995).

[60] 5. Apply the index flood model (6) using the growth curves obtained in step 4 and the location estimators obtained from step 2.1.

[61] 6. Repeat steps 1 to 5 M times where M is a large number, say M = 2000.

[62] 7. Evaluate for each site the true quantile curves QCp using the parameters of the parent distribution according to the type of region (e.g., Homog, HetMa50 and HetCo30).

[63] 8. Evaluate the true bivariate growth curves qCp using the parameters of the parent distribution for each site as follows. Find the population means μX and μY from the distribution parameters on each variable. In the present case, for the Gumbel distribution, we have: μX = βX + 0.5772αX and μY = βY + 0.5772αY. Divide each component of the true quantiles QCp (from step 7) by the corresponding population mean.

[64] 9. Evaluate the performance criteria described in equations (13) and (14), then (15) and finally (16).

[65] Step 8 is introduced to evaluate the performance of the estimation of the growth curve. In addition, note that step 8 produces a true growth curve for each site which is required for the simulation of heterogeneous regions. However, these growth curves are identical for homogeneous regions as it is assumed by the index flood model. This is similar to the univariate setting employed, for instance, by Hosking and Wallis [1997].

[66] After the bivariate procedure, the univariate estimation procedure is also applied for comparison purposes. Note that some elements of the present simulation procedure are inspired by Hosking and Wallis [1997] and by Chebana and Ouarda [2007].

5. Results

[67] The application of the described simulation procedure leads to the results presented herein. This section is divided into three parts: in the first one we present the preliminary simulations, in the second we present and analyze the main results and in the last part we study the effect of some factors on the performance of the proposed model.

5.1. Preliminary Simulations

[68] Before analyzing the results, three preliminary simulations are produced to explain some of the previously introduced notions. The first one corresponds to one repetition (M = 1) of the simulation procedure on the region HetMa50 with p = 0.9 where n = 30 and N = 15. Figure 3a shows the true and estimated quantile curves of the first, the middle and the last sites in the region. Table 1 presents estimation relative errors related to the first, the middle and the last sites in the same simulated region. Relative errors for bivariate quantiles, as curves, are evaluated using (13). The univariate quantiles are evaluated directly or as extreme scenarios of the proper part of the bivariate quantile curves. The univariate estimations are evaluated on the basis of the usual relative errors [e.g., Hosking and Wallis, 1997]. On the basis of criteria (13), from Table 1, we observe that the middle site, defined with parameters (9), of the homogeneous region, is the one best estimated. The estimation of the first site is acceptable whereas the estimation of the last site is the worst. The values in Table 1 reflect the evaluation criteria being used (13) for the bivariate quantile estimation. For instance, it is clear from Figure 3a that the bivariate quantile curve of the last site is underestimated with a high negative value (−20.94%), the univariate quantiles are both underestimated and the relative error related to Y is negative with high magnitude. In addition, for each site, similarity is observed between the estimation errors of the univariate quantiles evaluated both directly and as extreme scenarios even when only one sample is considered. Note that similar errors do not imply similar estimated values. Table 2 presents the true values of the univariate quantiles evaluated directly and as extreme scenarios of the bivariate curve. Table 2 provides also an indication about the relative difference between the two estimates. These relative differences are very low (less than 0.5%). Therefore, one can consider that values obtained by the two different methods are almost the same.

Figure 3.

Illustration of the proper parts of estimated and true 0.9-quantile curves (sample generated from HetMa50): (a) the 1st site, the 8th site, and the last site; (b) true quantile curves of the whole region; and (c) estimated quantile curves of the whole region. In Figures 3a and 3b the gray level is light from the first site to black for the last site.

Table 1. Relative Errors Corresponding to the 1st Site, the 8th Site, and the Last Site of the 0.9-Quantile Estimatesa
 1st Site8th Site15th Site
  • a

    The generated region is HetMa50 with n = 30 and N = 15. The univariate quantiles are evaluated directly and as extreme points of the bivariate quantile curves. The relative errors of the bivariate quantiles are evaluated using (13) and are given in percent.

RIE*i(p) for QCp3.782.09−20.94
Relative error for QLX1.220.33−6.23
Relative error for QDX1.240.42−6.09
Relative error for QLY1.412.14−15.90
Relative error for QDY1.442.24−15.77
Table 2. True Values of the Univariate Quantiles Evaluated Directly and as Extreme Points of the Bivariate Quantile Curve Using the Parameters of Site 8
  DirectAs Extreme PointRelative Differencea (%)
  • a

    In the relative difference we assume that the direct quantile value is the reference value; hence, relative difference is (as extreme point minus direct) divided by direct.

p = 0.9X2016.72021.40.23
 Y92.893.10.32
p = 0.99X2828.02832.30.15
 Y135.7135.90.15
p = 0.995X3068.23081.80.44
 Y148.4149.10.47

[69] The second preliminary simulation results are presented in Table 3 which summarizes the values of the criteria related to the same previously generated region for the bivariate as well as the univariate estimation. For the bivariate case we opted to present all possible combinations of criteria (RBR, ARBR and RRMSER) and “norms” (L1*, L1 and L2). Note that the values in Table 3 represent the whole region while values in Table 1 represent only particular sites. Bivariate results show, as expected, that it is appropriate to use L1* for the RBR and ARBR evaluation and L1 for the RRMSER evaluation. The values associated with L2 are given only as an indication. Hence, and in order to save space in the remainder of the paper, they are omitted. Furthermore, the regional averages of the univariate quantile errors estimated directly or as extreme scenarios are similar. Figures 3b and 3c show the true and estimated quantile curves, respectively, for all sites of the generated region. It can be seen that the true quantile curves are well ordered whereas the estimated ones intersect. This is because in the first case the parameters are known and ordered whereas in the second case the parameters are estimated and do not necessarily keep their order. Nevertheless, the whole view of the region, composed by the two groups of curves (true and estimated), shows a good agreement as a region and not for each site alone. The curves in Figures 3b and 3c are also in agreement with the values corresponding to L1* in Table 3. The agreement here concerns the whole region (all sites together) rather than site by site.

Table 3. Relative Errors Corresponding to the 0.9-Quantile Estimates of the Generated Regiona
 BivariateUnivariate Evaluated
As Extreme PointsDirectly
L1*L1L2QLXQLYQDXQDY
  • a

    The generated region is HetMa50 with n = 30 and N = 15. The univariate quantiles are evaluated directly and as extreme points of the bivariate quantile curves. The relative errors of the bivariate quantiles are evaluated using (16) and are given in percent.

RBR0.209.5511.24−0.060.140.020.24
ARBR9.039.5511.244.616.314.616.33
RRMSER9.039.5511.244.616.314.616.33

[70] As it was previously indicated, univariate quantiles can be either estimated directly using the index flood model or deduced from the bivariate estimation as extreme scenarios (extreme points of the quantile curve). Hence, in this third preliminary simulation, it is valuable to compare univariate quantiles obtained from both estimations. Table 4 shows, on the basis of M = 2000 generated HetMa50 regions with p = 0.9, 0.99 and 0.995, relative errors corresponding to univariate quantile estimation. These results combined to those in Table 2 indicate that, for a given variable X or Y, directly evaluated quantile estimates are very similar to those obtained as extreme points of the bivariate quantile curves for all values of p and for each criterion. This result remains valid for the outputs of the main simulations. Consequently, results related to the univariate quantiles when evaluated as extreme points are omitted from the next simulation results. This result shows that values provided by multivariate FA are very similar to those obtained by the univariate FA and also with an equivalent accuracy.

Table 4. Relative Errors of Univariate Quantiles Evaluated Directly and as Extreme Points of the Bivariate Quantile Curvea
 p = 0.9p = 0.99p = 0.995
QLXQLYQDXQDYQLXQLYQDXQDYQLXQLYQDXQDY
  • a

    The corresponding region is HetMa50 with n = 30 and N = 15.

RBR−0.030.040.010.090.320.370.350.410.170.300.230.37
ARBR3.123.503.083.475.505.875.485.865.986.455.936.41
RRMSER6.207.336.187.328.039.128.029.128.429.538.409.51

5.2. Main Results

[71] The main simulation results are presented in Tables 5a and 5b for all the considered regions. Even though, the focus is on quantile curve estimation, the evaluation of growth curve estimation qCp is also reported in order to explain the quantile results. From Tables 5a and 5b, three apparent elements can be observed.

Table 5a. Estimation Results for the Considered Regions When the Variables Are Dependenta
RiskCriterionQuantile EstimationGrowth Curve Estimation
BivbQDXQDYBivbqDXqDY
  • a

    N = 15, n = 30. Results are given in percent.

  • b

    The RBR and ARBR are evaluated using RIE*i[m](p), and the RRMSER is evaluated using RIEi[m](p).

Homog, H mean0
p = 0.9RBR−0.22−0.13−0.08−0.23−0.07−0.04
 ARBR0.220.130.120.230.070.04
 RRMSER9.185.146.210.980.931.08
p = 0.99RBR−0.07−0.04−0.07−0.22−0.06−0.08
 ARBR0.130.090.110.220.060.08
 RRMSER9.935.376.501.671.711.84
p = 0.995RBR−0.48−0.24−0.16−0.45−0.22−0.14
 ARBR0.480.240.170.450.220.14
 RRMSER9.105.426.432.061.791.91
HetCo30, H mean= 1.30
p = 0.9RBR−0.10−0.08−0.02−0.16−0.020.01
 ARBR3.181.852.091.891.852.07
 RRMSER9.755.526.622.182.162.43
p = 0.99RBR0.210.010.04−0.160.010.06
 ARBR5.863.253.453.483.253.50
 RRMSER11.566.477.533.963.824.09
p = 0.995RBR−0.16−0.04−0.01−0.54−0.02−0.00
 ARBR5.893.573.813.833.543.77
 RRMSER11.066.717.744.344.114.43
HetMa50, H mean= 1.36
p = 0.9RBR0.340.010.09−0.050.060.12
 ARBR6.513.083.473.733.093.46
 RRMSER11.526.187.323.903.303.72
p = 0.99RBR1.170.350.410.000.330.40
 ARBR10.585.485.866.305.455.87
 RRMSER14.768.029.126.635.876.31
p = 0.995RBR0.770.230.37−0.520.250.38
 ARBR10.695.936.416.705.936.33
 RRMSER14.428.409.517.046.356.79
Table 5b. Estimation Results for the Considered Regions When the Variables Are Independenta
RiskCriterionQuantile EstimationGrowth Curve Estimation
BivbQDXQDYBivbqDXqDY
  • a

    N = 15, n = 30. Results are given in percent.

  • b

    The RBR and ARBR are evaluated using RIE*i[m](p), and the RRMSER is evaluated using RIEi[m](p).

Homog
p = 0.9RBR−0.07−0.04−0.06−0.11−0.08−0.06
 ARBR0.180.100.150.110.080.06
 RRMSER7.865.106.181.010.961.05
p = 0.99RBR−0.12−0.13−0.04−0.13−0.09−0.07
 ARBR0.150.130.110.130.090.07
 RRMSER8.255.396.461.611.691.81
p = 0.995RBR−0.34−0.16−0.17−0.26−0.15−0.17
 ARBR0.340.170.180.260.150.17
 RRMSER7.215.416.491.811.841.93
HetMa50
p = 0.9RBR0.530.090.110.030.050.11
 ARBR7.063.063.474.223.093.46
 RRMSER10.906.147.284.393.323.71
p = 0.99RBR1.090.260.410.020.300.39
 ARBR10.605.435.916.435.455.87
 RRMSER13.828.029.146.715.866.31
p = 0.995RBR0.870.310.38−0.530.320.37
 ARBR10.305.916.326.895.936.33
 RRMSER13.118.409.487.206.376.79

[72] 1. In general, the values of the performance criteria increase with respect to the risk p in both univariate and bivariate settings. This behavior, well known in the univariate FA, is not systematic in the bivariate estimation, especially in terms of RRMSER. The usual explanation is that generally in FA, a quantile associated with a risk p is more accurately estimated than another one associated with a risk p′ if p < p′(when p and p′ are close to 1). The reason is that for a small risk, the corresponding quantile is close to the central body of the distribution, and hence, an important part of the data contributes to its estimation. However, in the bivariate setting, the situation is not similar. Indeed, in the multivariate context, the central part of a distribution contains little probability mass compared to the univariate setting. This is very obvious in higher dimensions; see Scott [1992] for more details and examples.

[73] 2. The relative bias RBR is very small in all regions and for all values of p. However, bivariate RBR's are larger than those of each one of the univariate but without exceeding 1.17%. The RBR low values are due to the symmetry regarding the parameters of the simulated regions.

[74] 3. The growth curve qCp results, for each value of p, are very similar for both variables separately (univariate) and also jointly (bivariate), especially in terms of RRMSER. However, this is not the case for quantile QCp estimation where differences are noticeable between bivariate and univariate results. This can be explained by the errors induced from the estimation of the index μ. That is, if one variable has a high error in its index μ, then, when multiplying it by the growth curve qCp, the final estimation result is affected accordingly. Note that the uncertainty related to the mean has more effect on the variability of the quantiles (through the RRMSER) than on the bias (RBR) because of error compensation in the RBR.

[75] In the homogeneous regions (Table 5a), the variability expressed in terms of the RRMSER in the growth curve estimation qCp is small compared to that of QCp for a fixed p. Hence, in homogeneous regions, the variability in QCp estimation originates essentially from the estimation of the index μ in equations (1) and (6). However, with respect to p, the variability in qCp estimation increases faster than the variability in QCp. This result may be explained by the fact that the mean has more influence on the central part of the distribution than on the tail. Hence, the contribution of the index variability decreases for large values of p.

[76] Table 5b presents results of Homog and HetMa50 when the variables X and Y are assumed to be independent, that is γ = 1 in copula (8). From Tables 5a and 5b, the comparison of the dependent and independent cases reveals two principal elements. First, there is no significant difference in the univariate results: The results of univariate estimation quantiles remain almost the same in both dependent and independent cases. The reason is that, intuitively, the marginal distributions are not affected by the copula, and mathematically, the copula has always the same values in the extreme points, that is C(u, 1) = u and C(1,v) = v for all u, v in [0,1] [see, e.g., Nelsen, 2006]. Second, in bivariate quantile estimation, the criteria values are slightly smaller in the independent case than in the dependent one. This may be justified by the presence of an extra parameter to be estimated in the dependent case (the parameter γ). This parameter in the independent case is not estimated and is fixed at γ = 1. We conclude that univariate estimation ignores the dependence structure of the event.

[77] Figure 4 shows the quantile estimation performance with respect to the 15 sites within the same simulated regions HetMa50 presented in Table 5a. The Bi and Ri, defined in (15), represent the RB and the RRMSE, respectively, for a site i. Similar results are obtained for the other types of regions and hence they are omitted. From Figure 4 the behaviors of both the RB and the RRMSE, obtained from the bivariate or univariate models are similar. It is observed that for each value of p, the bias is positive for the first half of the sites (from the 1st to the 8th site) and negative for the other half (from the 9th to the 15th site). This fact is observed in the univariate setting as well as in the bivariate one. It may be explained as follows: In the first half, the true quantile Q is smaller than the average quantile over the region equation image, Qequation image, since small values of the scale parameter α reduce quantile values in regions like HetMa50. Furthermore, the average quantile value should be very close to the estimated one equation image (equation imageequation image). Therefore, the quantile relative error is positive since it is greater than the negligible difference error between the estimate and the average quantiles (equation imageQequation imageequation image ≈ 0). The other half of the sites, where the bias is negative, can be treated similarly. Furthermore, we note that the RRMSE is small for sites with parameters close to those of the homogeneous region given in (9) and it increases according to the deviation of the site parameters from the central one. This is more apparent for high values of p. This behavior of the RB and RRMSE with respect to site number is also observed in the univariate index flood model [Hosking and Wallis, 1997].

Figure 4.

RB and RRMSE with respect to site number in the dependence case (γ = 1.414) of the HetMa50 region for the quantile curve associated with (left) p = 0.9, (middle) p = 0.99, and (right) p = 0.995.

5.3. Effect of Various Factors on the Estimation Results

[78] The proposed estimation model (6) may be affected by several factors. In this section, we present a short study dealing with the impact of the record length n, the region size N as well as the degree of region heterogeneity. Table 6 presents estimation results for the Homog, HetCo30 and HetMa50 regions when the variables are dependent where N = 15 and n = 30, 60 and 100. To facilitate comparisons, results for n = 30 are taken from Table 5a. We observe that when n increases, the main improvement is related to the RRMSER for each value of p. However, the RBR and the ARBR remain almost constant. Note that the values of the RBR and the ARBR are very low and their variations can be considered as proportionally similar to those of the RRMSER. On the one hand, the improvement of the RRMSER, with respect to n, is related to the heterogeneity degree of the region. That is, the improvement decreases slightly from Homog to HetMa50. On the other hand, the improvement for the bivariate estimation is slightly more important than for the univariate estimation in all considered regions.

Table 6. Quantile Estimation Results for Regions With Different Record Lengthsa
RiskCriterionn = 30n = 60n = 100
BivbQDXQDYBivbQDXQDYBivbQDXQDY
  • a

    Here n = 30, 60, and 100 with N = 15. Results are in percent.

  • b

    The RBR and ARBR are evaluated using RIE*i[m](p), and the RRMSER is evaluated using RIEi[m](p).

Homog
p = 0.9RBR−0.22−0.13−0.080.000.01−0.010.00−0.02−0.01
 ARBR0.220.130.120.110.040.080.080.040.06
 RRMSER9.185.146.216.673.644.395.262.833.43
p = 0.99RBR−0.07−0.04−0.07−0.05−0.02−0.07−0.04−0.07−0.01
 ARBR0.130.090.110.090.050.090.070.070.03
 RRMSER9.935.376.507.133.764.565.572.933.52
p = 0.995RBR−0.48−0.24−0.16−0.04−0.030.00−0.15−0.08−0.08
 ARBR0.480.240.170.120.060.080.190.100.10
 RRMSER9.105.426.436.833.814.645.422.983.56
HetCo30
p = 0.9RBR−0.10−0.08−0.020.070.030.010.120.030.04
 ARBR3.181.852.093.271.872.043.281.822.03
 RRMSER9.755.526.627.434.194.986.343.474.12
p = 0.99RBR0.210.010.040.390.130.090.460.100.19
 ARBR5.863.253.456.093.243.476.263.263.55
 RRMSER11.566.477.539.715.245.978.694.595.23
p = 0.995RBR−0.16−0.04−0.010.300.150.160.270.140.13
 ARBR5.893.573.816.133.533.766.263.513.77
 RRMSER11.066.717.749.345.456.178.474.815.43
HetMa50
p = 0.9RBR0.340.010.090.510.080.140.550.120.14
 ARBR6.513.083.476.673.093.466.753.073.46
 RRMSER11.526.187.329.724.975.838.844.385.10
p = 0.99RBR1.170.350.411.190.310.421.260.350.45
 ARBR10.585.485.8610.915.475.8711.045.445.86
 RRMSER14.768.029.1213.346.977.7512.716.457.16
p = 0.995RBR0.770.230.371.100.400.501.070.440.45
 ARBR10.695.936.4110.835.936.3310.965.936.37
 RRMSER14.428.409.5113.087.398.1912.466.917.60

[79] Similarly to Table 6, Table 7 presents estimation results of 0.99 quantiles for Homog, HetCo30 and HetMa50 regions when the variables are dependent where n = 30 and N = 10, 15, 20, 50 and 100. In order to simplify the comparison, results for N = 15 are taken from Table 5a. We observe, for a given type of region, a very slight improvement (less than 1%) of the RRMSER in both univariate and bivariate estimations whereas the RBR and the ARBR are almost constant but with a variability proportionally similar to the variability of the RRMSER. This behavior with respect to N is similar for the three region types although the values are different. The results corresponding to the 0.9 and 0.995 quantiles lead to similar conclusions and hence are not presented.

Table 7. Estimation Results of 0.99 Quantiles for Regions With Different Sizesa
CriterionN = 10N = 15N = 20N = 50N = 100
BivbQDXQDYBivbQDXQDYBivbQDXQDYBivbQDXQDYBivbQDXQDY
  • a

    N = 10, 15, 20, 50 and 100 with n = 30. Results are given in percent.

  • b

    The RBR and ARBR are evaluated using RIE*i[m](p), and the RRMSER is evaluated using RIEi[m](p).

n= 30, Homog, and p= 0.99
RBR−0.08−0.10−0.04−0.07−0.04−0.07−0.18−0.08−0.14−0.24−0.16−0.11−0.20−0.11−0.11
ARBR0.170.120.100.130.090.110.200.120.150.270.170.130.240.120.14
RRMSER10.215.596.659.935.376.509.755.256.359.495.096.189.425.046.14
n= 30, HetCo30, and p= 0.99
RBR0.21−0.010.060.210.010.040.300.080.080.170.010.030.11−0.01−0.00
ARBR6.053.353.635.863.253.455.853.213.465.643.093.315.643.073.32
RRMSER12.066.777.8111.566.477.5311.546.397.4911.226.167.2211.166.107.17
n= 30, HetMa50, and p= 0.99
RBR1.010.230.361.170.350.411.100.350.390.940.210.320.890.240.26
ARBR10.815.626.0510.585.485.8610.365.325.7410.165.205.5910.065.135.53
RRMSER15.178.329.4514.768.029.1214.577.899.0114.167.618.7114.047.538.61

[80] An important assumption of RFA (and the index flood model) is the homogeneity of the region which is checked in the delineation step. To study the effect of the heterogeneity degree of a region, we consider five regions with different heterogeneity degrees: one homogeneous region, two possibly homogeneous regions and two heterogeneous regions. The corresponding results are presented in Table 8. They indicate that, for each fixed value of p in the univariate as well as the bivariate settings, the quantile and the growth curve estimation errors increase with respect to the heterogeneity degree expressed through the mean values of H. It can be concluded that the heterogeneity degree has a negative effect on the performance of the estimation procedure.

Table 8. Quantile Estimation Results for Regions With Different Heterogeneity Degrees for p = 0.99 With n = 30 and N = 15a
 CriterionQuantile EstimationGrowth Curve Estimation
BivbQDXQDYBivbqDXqDY
  • a

    Results are given in percent.

  • b

    The RBR and ARBR are evaluated using RIE*i[m](p), and the RRMSER is evaluated using RIEi[m](p).

  • c

    The dependence parameter for these regions is γ = 3.162.

Homog, H mean ≈ 0RBR−0.07−0.04−0.07−0.22−0.06−0.08
 ARBR0.130.090.110.220.060.08
 RRMSER9.935.376.501.671.711.84
HetCo30, H mean = 1.30RBR0.210.010.04−0.160.010.06
 ARBR5.863.253.453.483.253.50
 RRMSER11.566.477.533.963.824.09
HetMa50, H mean = 1.36RBR1.170.350.410.000.330.40
 ARBR10.585.485.866.305.455.87
 RRMSER14.768.029.126.635.876.31
HetCo60, H mean = 3.23cRBR1.450.440.560.240.470.58
 ARBR11.426.547.056.316.567.07
 RRMSER15.788.8810.016.666.937.47
HetCo80, H mean = 5.29cRBR2.650.961.110.430.911.07
 ARBR15.398.879.548.458.849.55
 RRMSER19.0810.8312.048.739.139.86

[81] We conclude from Tables 6 and 7 that the impact of n and N is not significant on regional quantile estimation. Note that, in the hydrological context, the variations of n and N are generally small. However, as concluded by Chebana and Ouarda [2007], the effect of n and N is very important in the delineation step. Hence, the impact of the record length n and the region size N is indirect on the estimation step through the homogeneous region selection in the delineation step. Even though the estimation is not greatly affected by increasing values of N, there is still significant gain for carrying out the regionalization (transfer of information from other sites in the region). Indeed, when N = 1, if the target site contains enough data, then quantile estimation can be obtained directly by local FA. The regional methodology is of interest when the target site is ungauged or partially ungauged so that the local estimation is not possible or not efficient. The homogeneity or the possible homogeneity of regions are important conditions to the good performance of the procedure. The above conclusions are similar to those obtained by Hosking and Wallis [1997] for the univariate model.

[82] In the previous results, univariate and bivariate estimates were shown to be different in terms of values but similar in terms of behavior. The explanation lies partially in the difference in the criteria employed to evaluate the performances of each model. Furthermore, the main differences between univariate and bivariate models are conceptual. The univariate estimation results are presented only to be compared to the extreme points of the bivariate quantiles.

6. Conclusions and Future Work

[83] In the present paper we proposed an extension of the index flood model to the multivariate context. The proposed estimation procedure with the multivariate discordancy and homogeneity tests constitute a complete multivariate RFA procedure. Even though the procedure is shown to be valid in the multivariate setting, the present paper focuses on the bivariate case. The proposed model is based on copulas and on a bivariate quantile version. The bivariate quantile version employed is a curve composed by several statistically similar combinations, since they lead to the same risk. The univariate estimated quantiles, correctly combined, are particular cases corresponding to the extreme scenarios of the bivariate quantile curve. According to the available resources and the nature of the project, one or more convenient scenarios may be selected. Hence, the bivariate setting offers more flexibility to designers than the univariate framework.

[84] Simulation results of the bivariate version of the index flood model are similar to those of the univariate model in terms of behavior of the corresponding performance criteria. The results of univariate FA are provided (as extreme points) by the multivariate FA, and with an equivalent accuracy. The proposed model performs better when the region is close to homogeneity. Its performance is not significantly affected by small variations of the record length of sites or the region size. However, the whole regionalization procedure is affected by these factors through the delineation step. On the other hand, the univariate estimation results remain almost unchanged if the variables are dependent or independent. Hence, the univariate quantile estimates do not take into account the dependence structure of the variables characterizing the event.

[85] In the present study several elements of multivariate RFA are treated. Nevertheless, the following issues, among others, should have the merit to be developed in future efforts: (1) adaptation of the model for the estimation of other events of interest such as the simultaneous exceedance event expressed as {Xx, Yy}; (2) estimation of the multivariate index flood μ for ungauged sites using their physiographical characteristics; (3) definition of sharp criteria to measure the model performances. Indeed, if other phenomena and other types of copula are considered, then “distances” will be between sets instead of functions, since generally a quantile curve is a “set of points” and not necessarily a function (which is a particular case); (4) development of confidence bands associated with the regional estimates of the quantile curves. This is of interest to evaluate the amount of variation in the curve estimation; (5) a thorough sensitivity study of the impact of different factors that may affect the performances of the model, separately or combined. Such factors include: the estimation method of the distribution parameters, the fitted regional distribution including copula, as well as the effect of a misspecification of the bivariate distribution.

Notation
αX, βX, αY, βY

Parameters of the marginal Gumbel distributions.

γ

Dependence parameter in the Gumbel logistic copula.

ρ

Correlation coefficient.

τ

Kendall tau coefficient.

F, FX and FY

Joint distribution and marginal distributions for the random variables X and Y.

Cγ(.,.)

Gumbel logistic copula with parameter γ.

D

Bivariate discordancy test.

H

Bivariate homogeneity test.

n or ni

Site record length (of site i).

N

Number of sites in a region.

M

Number of replicates for the simulations.

QCp

Bivariate quantile curve associated with a risk p of the nonexceeding event.

Qx,y(p)

A point (a combination) of the curve QCp.

QCx(p) and QCy(p)

Coordinates of the point Qx,y(p), that is Qx,y(p) = (QCx(p), QCy(p)).

QDX(p) and QDY(p)

Univariate quantiles when directly evaluated.

QLX(p) and QLY(p)

Univariate quantiles when deduced as extreme values from the bivariate quantile curve.

Rp[m],i(.)

Coordinatewise relative errors of the proper part of the quantile curve of site i corresponding to the replication m and for a risk p.

Lpi

Length of the proper part of the true quantile curve QCpi of site i and for a risk p.

RIE*i[m](p)

Relative integrated error related to Rp[m],i(.).

RIEi[m](p)

Relative integrated error related to ∣Rp[m],i(.)∣.

Bi(p)

Bias for a site i evaluated on the basis of RIE*i[m](p).

Ri(p)

Root-mean-square error for a site i evaluated on the basis of RIEi[m](p).

RBR(p)

Regional bias evaluated as a mean over the region of Bi(p).

ARBR(p)

Regional absolute bias evaluated as a mean over the region of ∣Bi(p)∣.

RRMSER(p)

Regional quadratic error evaluated as a mean over the region of Ri(p).

Acknowledgments

[86] Financial support for this study was graciously provided by the Natural Sciences and Engineering Research Council (NSERC) of Canada and the Canada Research Chair Program. The authors wish to thank the Editor, Associate Editor, and three anonymous reviewers whose comments helped improve the quality of the paper.