A unified statistical model for hydrological variables including the selection of threshold for the peak over threshold method

Authors

  • S. Solari,

    Corresponding author
    1. Grupo de Dinámica de Flujos Ambientales, Universidad de Granada,Granada,Spain
    2. Centro Interdisciplinario para el Manejo Costero Integrado del Cono Sur, Universidad de la República,Montevideo,Uruguay
      Corresponding author: S. Solari, Grupo de Dinámica de Flujos Ambientales, Universidad de Granada, Av. del Mediterráneo s/n, Granada, ES-18006, Spain. (ssolari@ugr.es)
    Search for more papers by this author
  • M. A. Losada

    1. Grupo de Dinámica de Flujos Ambientales, Universidad de Granada,Granada,Spain
    Search for more papers by this author

Corresponding author: S. Solari, Grupo de Dinámica de Flujos Ambientales, Universidad de Granada, Av. del Mediterráneo s/n, Granada, ES-18006, Spain. (ssolari@ugr.es)

Abstract

[1] This paper explores the use of a mixture model for determining the marginal distribution of hydrological variables, consisting of a truncated central distribution that is representative of the central or main-mass regime, which for the cases studied is a lognormal distribution, and of two generalized Pareto distributions for the maximum and minimum regimes, representing the upper and lower tails, respectively. The thresholds defining the limits between these regimes and the central regime are parameters of the model and are calculated together with the remaining parameters by maximum likelihood. After testing the model with a simulation study we concluded that the upper threshold of the model can be used when applying the peak over threshold method. This will yield an automatic and objective identification of the threshold presenting an alternative to existing methods. The model was also applied to four hydrological data series: two mean daily flow series, the Thames at Kingston (United Kingdom), and the Guadalfeo River at Orgiva (Spain); and two daily precipitation series, Fort Collins (CO, USA), and Orgiva (Spain). It was observed that the model improved the fit of the data series with respect to the fit obtained with the lognormal (LN) and, in particular, provided a good fit for the upper tail. Moreover, we concluded that the proposed model is able to accommodate the entire range of values of some significant hydrological variables.

1. Introduction

[2] In the last several years, the development and use of mixture distribution models has increased due to their flexibility [Evin et al., 2011], in that they allow the user to model series of data from different populations. These models have been used with success for various geophysical applications, e.g., statistical downscaling of precipitation [Vrac and Naveau, 2007], simulating extreme rainfall events [Furrer and Katz, 2008], flow analysis [Evin et al., 2011], and time series simulations of sea state parameters [Solari and Losada, 2011].

[3] In general, the mixture models used have been composed of a central distribution and one (or two) distribution(s) of the tail(s). For these models, either the transition (threshold) values between the central and tail distributions are left undefined (e.g., Vrac and Naveau [2007] and Hundecha et al. [2009] use dynamic models imposing the threshold equal to zero], or these are defined a priori using a different method from that used to estimate the other parameters of the model [Furrer and Katz, 2008]. Recently, Carreau et al. [2009] expressed threshold values as a function of the other parameters of the model. Therefore, it should be investigated whether it is possible to include the determination of threshold values in the estimation method used for the distribution parameters. If so, it seems reasonable to explore whether the threshold value of the upper tail is a good choice as the estimate of the threshold value required to apply the peaks over threshold (POT) method. This would provide another way to estimate threshold values that is complementary to that described by Coles [2001], which unfortunately cannot be automated and requires user intervention in the process. The aim of this paper is to analyze the potential of applying a mixture model to parametrically model the entire range of values of hydrological variables and use the obtained upper thresholds values to define the series of peaks in the POT method.

[4] As an alternative to previously used models, we propose the use of a mixture model that is composed of a truncated central distribution, representative of the central regime, and two generalized Pareto distributions (GPD) for the upper and lower tails representing the maximum and minimum regimes, respectively. The transition thresholds between the three distributions are parameters of the model and are calculated by maximum likelihood (ML) simultaneously with the other parameters in the model. The former objective (i.e., to obtain a distribution for the entire range of values of a variable) has been found to be particularly useful when long term simulations of both, central and extreme values, is required [see, e.g., Solari and Losada, 2011; Solari and van Gelder, 2011].

[5] This paper is organized as follows. In section 2 the background, the models, and methodologies used are presented, their advantages and disadvantages are discussed. In section 3 an alternative mixture model and a working methodology are introduced to solve the main problems identified in section 2. The behavior of the proposed model is analyzed in section 4 by means of a simulation study. In section 5 the results from applying the model to four data sets are presented, analyzing its capacity to fit the entire ranges of the values of the variables. We focused particularly in the tails and the potential use of the upper threshold when applying the POT method. Finally, section 6 summarizes the conclusions. In the Appendix, specific aspects concerning the estimation methodology of the model parameters are discussed.

2. Background

2.1. Central Regime

[6] When modeling the bulk of the data for hydrologic variables such as streamflow and precipitation, it is common practice to use biparametric distributions such as the gamma, lognormal (LN), or biparametric Weibull for minima (WB) distributions [Chow, 1988]. Usually, these models provide a good fit with the data in the central area around the mean and the mode, but not in the tails [see, e.g., Furrer and Katz, 2008].

[7] There is no theoretical justification for choosing one specific model for the distribution of hydrological variables. However, given the amount of data recorded for the central regime, empirical distribution functions usually provide a sufficiently good estimate of the main mass of the data. Therefore, unlike what happens with extremes, where parametric modeling is required for extrapolation, adequate parametric modeling is not essential for modeling the central regime.

2.2. Extreme Values

[8] Parametric modeling for extreme conditions is required when attempting to infer unrecorded conditions from available data.

[9] Extreme value theory states that the distribution of the maxima or minima of an independent and identically distributed (i.i.d.) series of n elements tends to have one of the three forms of the generalized extreme value distribution. It also states that the distribution of the values that exceed a given threshold of a series of i.i.d. data tends to have a generalized Pareto distribution (GPD) when the threshold tends toward the upper bound of the variable [see, e.g., Coles, 2001; Castillo et al., 2005; Kottegoda and Rosso, 2008].

[10] These results establish the theoretical foundation for the two most widely accepted methods for modeling the extremes of several geophysical variables: the annual maxima (AM) method and the POT method [Leadbetter, 1991]. When the entire time series of data is available, the use of the AM method is associated with a significant loss of information concerning extreme events. The POT method, on the other hand, uses the data more efficiently by considering more than one sample per year. In this sense, the POT method is preferred over the AM method when an entire time series is available [Madsen et al., 1997].

2.3. POT Method

[11] Given a series of i.i.d. data, the values that exceed a sufficiently high threshold follow a GPD [Pickands, 1975]. However, when these values show a tendency to form clusters (i.e., for storm events), as is the case for several geophysical variables, there are two ways of addressing the problem: (1) by declustering the data or (2) by accounting for the dependence of the series. The first framework is the most widely adopted [Coles, 2001] and includes the POT method. In contrast, the second framework may require the use of more sophisticated statistical models and will not be discussed here.

[12] The POT method is considered to be the declustering method most commonly used by hydrologists and coastal engineers [Coles, 2001]. References to it can be found by Davison and Smith [1990], although, as pointed out by Coles [2001], the general idea is much older. Given a chosen threshold, the exceedance values that are separated by less than a given minimum time span are assumed to form a cluster. Each cluster is assumed to be generated by the same extreme event. For every cluster defined in this way, the maximum recorded value is taken. This leads to the construction of a POT series of independent observations. It is clear that the application of the POT method requires a previously defined threshold as well as a minimum time between threshold exceedance events that ensures the independence of the POT series. The interested reader is referred to Lang et al. [1999] for a review of operational guidelines for over-threshold modeling. The study of alternative declustering methods is beyond the scope of this paper. Such information can be found by Ferro and Segers [2003] and references therein. This paper only addresses the application of the most common POT declustering method and provides an alternative model for threshold selection.

[13] In this work, once the threshold is chosen or estimated, the minimum time between extreme events is selected such that the autocorrelation of the POT data series is not significantly different from zero, and as such the occurrence of peaks meets the hypothesis of Poisson. For the former condition the Spearman's rank correlation is used for estimation of lag one autocorrelation. For the latter condition the test discussed by Cunnane [1979] for the dispersion coefficient is used.

2.4. Threshold Identification

[14] A threshold is an essential parameter used by the GPD, whether or not the POT method is used. The threshold is, in fact, a distribution parameter (the location parameter of the GPD), and the threshold can be estimated once a data series is defined. However, the problem is that a predefined threshold is required for a data series. This implies that threshold estimation is not straightforward, and generally this estimation cannot be accomplished with the same methods used for estimating the rest of the parameters of a GPD.

[15] Despite the importance of the threshold in the analysis of extreme events, the existing methods for threshold identification are, in some way, based on subjective judgment.

[16] Two common ways of choosing a threshold are based on expert judgment. One way is to select a fixed quantile corresponding to a high nonexceedance probability, usually inline image, inline image, or inline image (see, e.g., Luceño et al. [2006] or Smith [1987]). The other way is to impose a minimum on the mean number of clusters per year.

[17] There are other methods that provide some guidance for threshold identification and limit the subjectivity of its selection: the graphical method (GM) and the optimal bias-robust estimation (OBRE) method. The GM is based on the stability of the GPD parameters. The OBRE method is based on the procedure used for parameter estimation. These methods are discussed below. Recently, Thompson et al. [2009] proposed a new method that is also based on the stability of the GPD parameters. This method will not be discussed in this work.

[18] The graphical method is based on the stability of the shape and scale parameters of a GPD [see Coles, 2001]. Provided that all of the exceedance values of the threshold u0 follow a GPD with parameters inline image and inline image, the exceedance values of any other threshold u, such that inline image, must follow a GPD with parameters inline image and inline image that satisfy inline image. Then the expectation of exceedance given by inline image must be a linear function of u for all inline image.

[19] Consequently, there are two ways of applying the GM. The simplest is to construct the mean residual life plot (MRLP). Given a series of thresholds, the MRLP is the locus of points given by

display math

[20] For inline image, u0 being the threshold at which the GPD is a good approximation of the data, the MRLP should be approximately linear to u. The other option is to estimate inline image and inline image for several thresholds and plot inline image and inline image. In this case the plots should be constant for inline image.

[21] A major advantage of the GM is that its implementation is straightforward and it reduces the subjectivity associated with threshold selection. However, the GM has a remaining subjective component that requires human judgment and cannot be automated, and it is unable to provide the uncertainty of the threshold estimation.

[22] Dupuis [1998] proposed performing parameter estimation and selecting the threshold by introducing the optimal bias-robust estimator (OBRE), which is an M estimator [see de Zea Bermudez and Kotz, 2010a] that attributes weights (equal to or less than one) to the data used in parameter estimations. Dupuis [1998] suggests using these weights as a guide for choosing the threshold.

[23] Neither the GM nor the method proposed by Dupuis [1998] provides the uncertainty associated with the estimation of the threshold.

2.5. Mixture Models

[24] The types of mixture models used in this work incorporate both the central and extreme populations into a single model and thresholds are included as parameters. Different versions of this approach have been recently used by Frigessi et al. [2002] for modeling a fire loss data set from Denmark, by Vaz de Melo Mendes and Freitas Lopes [2004] for economic and environmental indices, by Behrens et al. [2004] for economic indices, by Tancredi et al. [2006] for daily maximum flow rates, by Vrac and Naveau [2007], Furrer and Katz [2008], and Hundecha et al. [2009] for precipitation, and by Cai et al. [2007, 2008] for significant wave heights. Some authors refer to the mixture models composed of truncated distributions that do not overlap as hybrid models, while in this paper we use the term mixture model indiscriminately.

3. Model Description

[25] With the exception of the mixture models, all the methods reviewed analyze the central and maxima regimes separately. More specifically, the study of the central regime is based on all available observations, and the study of the maxima regime is based on a subsample of data representative of the extreme conditions. In line with Vaz de Melo Mendes and Freitas Lopes [2004], Behrens et al. [2004], Tancredi et al. [2006], Cai et al. [2007], Furrer and Katz [2008], and here, the use of a parametric mixture model that is valid for the entire range of values of the variable is proposed. It also differentiates the three populations: (1) a central regime for the bulk of the data; (2) a minima regime for the lower tail, and (3) a maxima regime for the upper tail. The thresholds that define the limits between these three populations are parameters of the model and are thus estimated in the same way as the other parameters. The model proposed is that of (2), where fc and Fc are the probability density function and the corresponding cumulative probability distribution assumed for the central regime, respectively, fm is the probability density function for the minima regime and fM is the probability density function for the maxima regime:

display math

[26] This model does not consider the time dependency of the data; it is a model for the marginal distribution of the random variable. The main advantage of the model is that it automatically and objectively calculates the thresholds that indicates the limit between the minima, central, and maxima regimes, as well as their uncertainties.

[27] For the central regime, an LN distribution is used:

display math

where inline image and inline image are location and scale parameters, respectively. Similarly, a gamma, WB, or any other distribution could be used. Both of these, or any other distribution function, are equally valid for the procedure described in the following paragraphs. The minima and maxima GPDs are used for describing the minima and maxima regimes:

display math
display math

where u1 and u2 are the upper and lower thresholds of the central regime; inline image and inline image ( inline image and inline image) are the scale and shape parameters, respectively, such that inline image and inline image. The case of inline image, in which the GPD reduces to an exponential distribution with the parameter inline image, is not considered in this analysis. Moreover, for the minima GPD, inline image if inline image, and inline image if inline image. Conversely, for the maxima GPD, inline image if inline image, and inline image if inline image. The model proposed (2) has eight parameters, which are estimated using the maximum likelihood (ML) method.

3.1. Continuity and Physical Bounds

[28] Model (2) as well as the models proposed by Vaz de Melo Mendes and Freitas Lopes [2004], Behrens et al. [2004], Tancredi et al. [2006], and Cai et al. [2007] have discontinuities in the PDF at the threshold values. However, experience does not indicate that such discontinuities exist for geophysical variables. Therefore, continuity is imposed on the PDF in (2), allowing the expression of the scale parameters of the GPDs as a function of the corresponding location parameters. Also, as the variables studied in the following section (mean daily flow and daily precipitation) must be positive, the condition inline image is imposed on the model, so the shape parameter of the minima GPD can be expressed as a function of the scale and location (threshold) parameters.

[29] Using these conditions, the following relationships are obtained:

display math

[30] As a result, the number of parameters of the model is reduced to five (u1, u2, inline image, inline image, inline image). This simplifies the model and as a result of using previously specified physical information its uncertainty is also reduced.

3.2. Parameter Estimation and Confidence Intervals

[31] There are several methods that estimate the distribution parameters based on the observed data. One of the most commonly applied, due to its flexibility and properties, is the ML method [see Coles, 2001, ch. 2]. This is the method we used for this research. For an in-depth discussion of methods for GPD parameter estimation, see de Zea Bermudez and Kotz [2010b, 2010a]. A discussion of the methods used to minimize the negative log likelihood function (NLLF) and the precautions that should be taken in this process can be found in the Appendix.

[32] The confidence intervals of the parameters are estimated by using the covariance matrix inline image. This is calculated as the inverse of the information matrix, which is obtained numerically [see Coles, 2001; Castillo et al., 2005]. This way, the uncertainty associated with the estimation of any parameter of the model, including the threshold u2, is obtained.

3.3. Estimations of Extreme Events Using the Proposed Model

[33] When series are i.i.d., the quantiles for any return period can be obtained directly from the model (2). However, geophysical variables are rarely i.i.d. because most of the time high values tend to group together to form clusters (due to storm events in the case of daily flow and precipitation data).

[34] Here we propose to use the upper threshold u2 estimated with the model (2) to construct the POT series. To do this, the minimum time between the end of an exceeding event and the start of a new one is selected in such a way that the resulting POT series meets the Poisson hypothesis [see Cunnane, 1979] and does not show lag one autocorrelation.

[35] Once the POT series is constructed a GPD is fitted to it by means of ML using the given threshold. Then, high return period quantiles are calculated and their confidence intervals are estimated by means of the likelihood profile [see Coles, 2001]. As an alternative, the probability weighted moments (PWM) method is also used for parameters estimation in the simulation study (section 4). This method outperforms ML when the data series is short [Hosking and Wallis, 1987].

[36] Alternative paths to extreme value estimation using the proposed model may be explored either by using the result that, in the limit, the distribution of the cluster peaks is the same as the distribution of all exceedances [Leadbetter, 1991], or by using the entire data series instead of only the peaks. For the latter case the reader is referred to Fawcett and Walshaw [2007].

4. Simulation Study

[37] A simulation study was performed in order to (a) analyze the ability of the model (2) (hereafter called LNGPD) to correctly estimate the upper threshold u2, and (b) to explore the usefulness of this threshold to apply the POT method.

[38] In this study a known LNGPD was used, with parameters inline image, inline image, inline image, inline image, 0.7, and 1.2. The thresholds u2 cumulative probability was inline image equal to 0.79, 0.9, and 0.99, respectively. To impose temporal dependence to the series an order one Markov model was used, modeled by a Gumbel-Hougaard copula with parameter inline image (independent series), inline image (moderate dependence), and inline image (strong dependence) [see, e.g., Fawcett and Walshaw, 2006; Salvadori et al., 2007].

[39] One-thousand data series were simulated for each model. Each series was composed by 25 years of daily data (i.e., 9132 data each serie). A LNGPD model was fitted to each series. Next, using the estimated upper threshold inline image, a POT series was constructed and a GPD fitted to it. This GPD was used for estimation of 50, 100, and 250 years return period quantiles ( inline image). At the same time, the known threshold u2 was also used for POT series construction and quantiles estimations. The estimation of the GPD parameters of the POT series was done by means of both ML and PWM methods [Hosking and Wallis, 1987].

[40] The bias and root mean square error (RMSE) of both inline image and inline image was analyzed using the known values. Expected values for QTR were estimated through the simulation of 250,000 independent years, and were used to estimate bias and RMSE of the inline image obtained using estimated ( inline image) and true (u2) threshold values.

[41] Table 1 shows relative bias and RMSE of inline image. Tables 2 and 3 show relative bias and RMSE of inline image. Table 4 summarizes the number of simulations (over 1000) that were discarded because they had less than 2 peaks over threshold and/or the results in inline image values were unrealistically large (i.e., several orders of magnitude greater than the real value). Figures 1 and 2 show the results obtained for inline image.

Figure 1.

Relative bias and RMSE for threshold u2 and absolute bias and RMSE for parameter inline image. For the plots on the right, black symbols correspond to results obtained with estimated threshold inline image and white symbols to results obtained with real threshold u2. Simulations made with inline image (squares), inline image (diamond), and inline image (triangles); and with inline image.

Figure 2.

Relative bias and RMSE for quantiles of 50, 100, and 250 years return period. Black symbols correspond to results obtained with estimated threshold inline image and white symbols to results obtained with real threshold u2. Simulations made with inline image (squares), inline image (diamond), and inline image (triangles); and with inline image.

Table 1. Relative BIAS and RMSE for Threshold u2 of the GPD Model, Based on 1000 Simulation of 25 Years of Daily Dataa
 u0 inline image inline image inline image inline image inline image
θ = 1θ = 2θ = 4θ = 1θ = 2θ = 4θ = 1θ = 2θ = 4θ = 1θ = 2θ = 4θ = 1θ = 2θ = 4
  • a

    u2 = 0.55, 0.7, and 1.2 correspond to probabilities inline image, inline image, and inline image, respectively.

BIAS0.55−0.00−0.01−0.03−0.000.020.040.010.030.06−0.00−0.00−0.01−0.00−0.01−0.02
 0.7−0.01−0.02−0.030.000.010.000.040.040.04−0.00−0.00−0.04−0.00−0.01−0.03
 1.2−0.01−0.04−0.11−0.00−0.03−0.110.02−0.02−0.100.00−0.06−0.09−0.00−0.06−0.13
RMSE0.550.010.030.090.080.140.240.180.240.310.060.110.210.030.050.11
 0.70.020.040.110.060.130.220.190.280.340.080.160.230.040.070.17
 1.20.040.100.220.100.160.240.200.250.260.150.250.300.090.220.32
Table 2. Relative BIAS for 250 Years Return Period Quantile, Estimated With the GPD Adjusted to the POT Series Obtained With the Estimated and the Real Threshold, Based on 1000 Simulation of 25 Years of Daily Data
u0 inline image inline image inline image inline image inline image
θ = 1θ = 2θ = 4θ = 1θ = 2θ = 4θ = 1θ = 2θ = 4θ = 1θ = 2θ = 4θ = 1θ = 2θ = 4
0.55
 ML               
   inline image−0.010.010.14−0.030.030.26−0.080.100.46−0.130.200.76−0.140.270.76
  u2−0.010.010.12−0.030.030.27−0.080.090.49−0.140.180.71−0.150.250.81
 PWM               
   inline image−0.050.060.32−0.090.090.40−0.130.160.45−0.150.180.48−0.110.140.30
  u2−0.040.060.27−0.090.100.39−0.140.150.47−0.160.160.44−0.120.130.29
0.7
 ML               
   inline image−0.010.000.11−0.020.010.18−0.040.040.25−0.040.150.39−0.030.190.40
  u2−0.010.000.07−0.020.020.16−0.040.060.29−0.050.120.44−0.040.170.54
 PWM               
   inline image−0.020.050.23−0.040.060.28−0.070.070.27−0.040.130.26−0.030.090.15
  u2−0.020.040.17−0.030.070.24−0.050.100.28−0.050.110.27−0.030.080.16
1.2
 ML               
   inline image−0.01−0.010.09−0.02−0.020.09−0.02−0.010.110.160.070.150.380.240.17
  u2−0.01−0.02−0.02−0.01−0.020.00−0.000.010.060.050.090.200.170.270.51
 PWM               
   inline image−0.000.030.13−0.020.020.11−0.020.000.060.07−0.01−0.010.03−0.05−0.10
  u20.010.030.030.010.030.020.010.030.000.010.01−0.04−0.01−0.05−0.11
Table 3. Relative RMSE for 250 Years Return Period Quantile, Estimated With the GPD Adjusted to the POT Series Obtained With the Estimated and the Real Threshold, Based on 1000 Simulation of 25 Years of Daily Data
u0 inline image inline image inline image inline image inline image
θ = 1θ = 2θ = 4θ = 1θ = 2θ = 4θ = 1θ = 2θ = 4θ = 1θ = 2θ = 4θ = 1θ = 2θ = 4
0.55               
 ML               
   inline image0.010.020.160.040.060.310.130.200.690.190.351.010.250.521.14
  u20.010.020.140.040.060.320.100.170.620.180.340.990.240.521.28
 PWM               
   inline image0.050.070.350.100.120.450.170.240.600.210.310.650.280.450.64
  u20.050.070.300.090.130.430.150.210.550.200.300.600.270.440.64
0.7               
 ML               
   inline image0.010.020.280.030.060.260.120.190.480.200.360.690.300.540.86
  u20.010.020.110.040.070.240.090.170.470.180.340.800.280.561.14
 PWM               
   inline image0.030.070.270.060.100.350.130.200.420.200.320.500.330.490.60
  u20.030.070.210.060.120.300.100.190.400.180.300.500.320.480.62
1.2               
 ML               
   inline image0.030.060.370.070.120.400.220.480.710.700.711.451.551.411.76
  u20.030.060.150.070.130.280.180.300.560.420.731.230.831.692.86
 PWM               
   inline image0.050.090.240.080.130.230.210.220.240.360.360.320.540.530.50
  u20.060.090.130.100.140.180.180.230.270.330.370.390.550.550.54
Table 4. Number of Failed Cases Over 1000
u0ξ–0.4–0.200.20.4
θ124124124124124
0.55                
 ML inline image1
 u2
 PWM inline image
 u2
0.7                
 ML inline image1
 u2
 PWM inline image
 u2
1.2                
 ML inline image1158441335231220
 u2 66677
 PWM inline image11561410212115
 u2 55555

[42] In regards to inline image (Table 1 and Figure 1), positive bias was observed when inline image was close to zero and negative bias otherwise. In general, bias was lower than inline image, except when threshold was high ( inline image) and dependence was strong ( inline image); in this latter case bias reached values slightly higher than inline image. In regards to RMSE we noted that it increased with the dependence of the series ( inline image) and with the threshold, and when inline image was close to zero, reaching values in the order of inline image.

[43] In regards to inline image (Tables 2 and 3 and Figure 2), we noted that both bias and RMSE increase when inline image, u2, and inline image increase. However, the bias and the RMSE obtained using the estimated threshold inline image are very similar to those obtained using the true threshold u2.

[44] It is remarkable that, although there are non-negligible bias and RSME in the inline image, the bias and the RMSE of high return period quantiles obtained with inline image and with u2 are very similar, i.e., the increase in the bias and in the RMSE of the threshold inline image, observed when inline image was close to zero, did not impact the bias and the RMSE of inline image.

[45] As to the limitations of the proposed methodology, we observed that for high thresholds (cumulative probability of the order of inline image), and particularly for series with strong dependence ( inline image), the number of failed simulations obtained with the estimated threshold inline image increases in comparison with the number obtained using the true threshold u2 (see Table 4). Note that these simulations do not allow for the establishment of absolute limits. However, users have to bear in mind that when the estimated threshold inline image is relatively high and depending on the autocorrelation and the length of the series being analyzed, it is possible that the resulting POT series is too short to produce reliable results, preventing the use of threshold inline image in such cases.

[46] With the exception of a few cases, the results obtained with the estimated threshold inline image do not outperform those obtained with the real threshold. However, both bias and RMSE of high return level quantiles obtained with both thresholds are of the same order of magnitude. This supports the use of inline image, obtained by fitting the LNGPD model by means of ML, for the application of the POT method for the estimation of high return period quantiles of nonindependent series.

5. Application

[47] For this section the LNGPD was used to model four sets of data. For each case, the fit obtained with LNGPD model is compared with the fit obtained with the LN distribution. In addition, we examined the results obtained by using the u2 threshold for the POT method and compared them with those obtained with the threshold obtained with the GM.

[48] Daily precipitation data series from Fort Collins, USA and Orgiva, Spain, and mean daily flow data sets from the Thames in Kingston, UK and Orgiva, Spain were used. These data series encompass different situations: The Fort Collins and the Thames River data series are of very long duration, while the data series of flows in Orgiva is of short duration; for the precipitation series (Fort Collins and Orgiva), as for the flow series in Orgiva, there is no physical basis to justify imposing an upper bound on the variables, and therefore it is expected that the distributions of the extreme values of these are heavy tailed. On the other hand, Thames River is regulated to avoid flooding, and for this series, the existence of an upper limit for the maximum values of the variable may be justified, except for exceptional cases in which the flood control system was exceeded.

5.1. Daily Precipitation at Fort Collins

[49] The daily precipitation series at the station in Fort Collins, CO, obtained from the Colorado Climate Center (http://ccc.atmos.colostate.edu/dly-form.html) was analyzed. This same data set has been used previously in the work of Katz et al. [2002] and Furrer and Katz [2008].

[50] The series used covers the period from 1900 to 2010. Of the 40,543 available data, only 9036 correspond to nonzero data, equivalent to inline image of the total data.

[51] First, the WB, LN, and gamma distributions were adjusted to the nonzero data. None of the three candidate distributions is able to correctly fit both the central and the upper tails, and we have thus chosen the least poor distribution with respect to the values of the likelihood function, and the p value [Benjamin and Cornell, 1970] of the inline image and Kolmogorov-Smirnov tests (both test rejected for the three distributions). The chosen distribution according to these criteria is the LN. There are other tests available for selecting the central distribution. However, for the applications discussed in this work, these two tests yielded good results.

[52] To improve the fit, the LNGPD mixture model was used, estimating the parameters by using the ML method. First, a value for the u1 parameter that is less than the minimum value of the nonzero data ( inline image) was obtained. Therefore, the parameters are again estimated imposing inline image, thus discarding the GPD of the minimums. The estimated parameters and their variances are listed in Table 5. The LNGPD model fits the data better than the LN, particularly in the upper tail, as shown in the Q-Q plot presented in Figure 3. The Akaike Information Criterion or AIC [Akaike, 1974] was used to check for overparametrization. The decrease in the AIC value obtained with the LNGPD ( inline image and inline image) shows that the improvement of the fit obtained with mixture distribution is significant.

Figure 3.

QQ plot. Fitted LN (red asterisks) and fitted LNGPD (green circles) models. Fort Collins nonzero daily precipitation (mm) data.

Table 5. Parameters of the LNGPD Model Fitted to the Fort Collins Nonzero Daily Precipitation Data, Imposing inline image
u2 inline image inline image inline image
9.80.681.390.196
0.15 inline image inline image inline image

[53] The u2 upper threshold obtained by adjusting the LNGPD model was 9.8 mm d−1, with a confidence interval of inline image inline image. This value has cumulative probability equal to inline image when considering nonzero data and inline image when considering the entire data set. Expressed in inches the threshold is 0.39 in., which is almost equal to the threshold used by Katz et al. [2002] for the POT method (0.40 in.), although they mention that the threshold is selected relatively low in order to be valid for all seasons since their study was focused on modeling the annual cycle of the parameters of the distribution.

[54] Next, the feasibility of using the upper threshold u2 for the POT method was studied. To differentiate the analysis of the POT data from that of the entire data set, the threshold used to construct the POT series is called u, and the shape parameter of the GPD of the maxima adjusted using the POT series is called inline image.

[55] First, we analyzed if the resulting POT series fulfills the Poisson hypothesis and if they have lag one autocorrelation, considering different thresholds u and different minimum times between exceedances inline image. For the former the test described by Cunnane [1979] was used, while for the latter the Spearman rank correlation between consecutive peaks was estimated. Table 6 summarizes the results obtained. We observed that for thresholds 40 and 50 mm d−1, the obtained POT series were inadequate for any value of inline image. For thresholds less than 40 mm d−1, adequate POT series were obtained with inline image less than 6 days. To proceed with the analysis inline image was selected, although similar results were obtained with inline image between 0 and 4 days.

Table 6. Verification of the Poisson Hypothesis and the Autocorrelation of the POT Series at Fort Collinsa
u (mm d−1)Minimum Interstorm Time (days)
02468101214
  • a

    Bold numbers correspond to inline image significantly different from zero or dispersion D significantly different from one (in both cases significance is inline image). Combinations (u, InterMin) with at least one bold number are considered inadequate for extreme value analysis with the POT method.

10 inline image−0.00−0.00−0.02−0.04−0.03−0.02−0.01−0.01
 D1.081.040.850.770.690.640.590.54
20 inline image−0.020.00−0.00−0.03−0.03−0.02−0.02−0.02
 D0.950.920.840.770.750.760.730.73
30 inline image0.060.050.040.050.050.040.030.05
 D1.171.181.191.101.071.061.071.06
40 inline image−0.17−0.19−0.19−0.21−0.25−0.25−0.25−0.25
 D1.321.291.291.241.151.151.151.15
50 inline image−0.38−0.38−0.38−0.38-0.46−0.46−0.46−0.46
 D1.171.171.171.171.011.011.011.01
60 inline image−0.21−0.21−0.21−0.21−0.21−0.21−0.21−0.21
 D1.031.031.031.031.031.031.031.03

[56] Table 6 shows that even though the u2 threshold gives a relatively high number of events per year ( inline image), the set of peaks over threshold obtained with inline image does not show autocorrelation and follows a Poisson process. For this reason, the threshold is considered valid for the POT analysis.

[57] Figure 4 shows the results of applying the GM using inline image. From the MRLP two possible thresholds were identified: 40 and 30 mm d−1. When taking into account the inline image confidence intervals a threshold close to 10 mm d−1 cannot be ruled out. The threshold 40 mm d−1 is discarded for two reasons. First, for any Tmin this threshold results in POT series that do not meet the Poisson hypothesis or have autocorrelation (see Table 6). Second, it results in less than one event per year, so there would be many users that would rule it out in favor of the use of the AM method. In contrast, other users may argue that the POT series is more informative about extreme values than the AM series, and that using the latter will bias the estimation of the quantiles. In regards to the shape parameter, we observed that it is relatively stable for u > 30 mm d−1, so this threshold was chosen for extreme value analysis—cumulative probability for this threshold is inline image when considering nonzero data and inline image when considering all data. As with the MRLP, when confidence intervals (estimated with the profile likelihood method) are taken into account, a threshold as low 12 mm d−1 cannot be ruled out.

Figure 4.

Mean residual life plot for Fort Collins daily precipitation (mm) POT series.

[58] It is interesting to compare the behavior of the inline image parameters of the GPD fitted to the POT data obtained with the different thresholds. For inline image mm d−1, the shape parameter is 0.19, with a inline image confidence interval estimated by means of the profile likelihood of inline image, i.e., the GPD has a heavy tail. When the threshold is set to inline image, the shape parameter is 0.01, with a confidence interval of inline image inline image, i.e., a heavy tail is also obtained but inline image confidence interval does not rule out a light tail.

[59] Figure 5 shows the inline image confidence intervals, estimated by means of the profile likelihood method, of different return period quantiles obtained with the GPDs fitted to the POT series of thresholds inline image mm d−1 and inline image mm d−1. From this, it is not straightforward to decide which threshold is the most suitable for extreme values estimation, and the decision will depend ultimately on user criteria.

Figure 5.

Fort Collins daily precipitation (mm d−1) POT series and inline image confidence intervals of the GPDs fitted using inline image and inline image.

[60] We concluded that in the case of the nonzero daily rainfall data series for Fort Collins, CO, the LNGPD distribution adequately models the full range of values of rainfall, and provides an upper threshold that is an adequate alternative for applying the POT method.

5.2. Mean Daily Flow at Thames at Kingston

[61] We used the mean daily flow values (gauged daily flow) from the River Thames station Thames at Kingston. Data were obtained from the UK Centre for Ecology & Hydrology (http://www.ceh.ac.uk/data/nrfa/data/time_series.html?3900). The series used covered the period from 1883 to 2009, with a total of 46,386 data. This same series was used by Eastoe and Tawn [2010] to study alternatives to the Poisson model for modeling occurrence of extreme events.

[62] As with the data series from Fort Collins; WB, LN, and gamma distributions were fitted and evaluated, finding that the LN distribution provided the best fit, although it is a poor fit in the tails. Then, the LNGPD mixture model was used, leading to a significant improvement in the fit with respect to that obtained with the LN model (AIC was reduced from 477,956 to 476,285). Figure 6 present the Q-Q plot for both distributions. Table 7 lists the estimated values and the variances of the parameters of the LNGPD model.

Figure 6.

QQ plot. Fitted LN (red asterisks) and fitted LNGPD (green circles) models. Thames at Kingston mean daily flow (m3 s−1).

Table 7. Parameters of the LNGPD Model and Their Variances Fitted to the Thames at Kingston Mean Daily Flow Data
u1u2 inline image inline image inline image
7.2124.53.711.07−0.092
0.0141.8 inline image inline image inline image

[63] Assuming that this is a closely regulated river an analysis on the extremes of the flow series may not be recommended for engineering applications. Despite this, we proceeded to study the behavior of the POT series because it is instructive to explore the capabilities of the LNGPD model.

[64] As for the Fort Collins series, first we analyzed if the POT series meets the Poisson hypothesis and if it has autocorrelation, considering several threshold and several minimum time between events. Table 8 summarizes the results of this analysis. We noted that for inline image any threshold results in POT series that meets the Poisson hypothesis and does not have lag one autocorrelation. Then, inline image was used for applying the GM.

Table 8. Verification of the Poisson Hypothesis and the Autocorrelation of the POT Series at Thames at Kingstona
u (m3 s−1)Minimum Interstorm Time
02468101214
  • a

    Bold numbers correspond to inline image significantly different from zero or dispersion D significantly different from one (in both cases significance is inline image). Combinations (u, InterMin) with at least one bold number are considered inadequate for extreme value analysis with the POT method.

100 inline image−0.05−0.01−0.00−0.04−0.03−0.03−0.04−0.05
 D1.701.341.141.010.900.850.800.78
140 inline image0.010.020.030.020.01−0.02−0.07−0.08
 D1.511.281.181.080.910.860.830.74
180 inline image0.100.110.110.140.120.090.070.05
 D1.391.221.020.930.800.760.820.81
220 inline image0.040.060.030.020.030.01−0.02−0.03
 D1.691.431.321.081.081.020.900.88
260 inline image0.000.030.040.01−0.02−0.03−0.05−0.14
 D1.911.671.391.341.211.161.070.93
300 inline image−0.070.01−0.07−0.07−0.09−0.09−0.09−0.06
 D1.601.381.321.341.201.181.091.08

[65] In their work Eastoe and Tawn [2010] used u = 200 m3 d−1 and inline image for the definition of the POT series. As shown in Table 8, with those values the resulting POT series is overdispersed, as found by Eastoe and Tawn [2010].

[66] Figure 7 presents the plots of the GM. According to these plots the upper threshold appropriate for applying the POT method is u = 220 m3 d−1 (corresponding to cumulative probability inline image), higher than the one obtained from the LNGPD mixture model ( inline image m3 s−1, corresponding to cumulative probability inline image). It is also clear from the graph that the adjusted GPD with threshold u = 200 m3 s−1 will have a heavy tail, while the adjusted GPD with inline image m3 s−1 will have a light tail.

Figure 7.

Mean residual life plot for the Thames at Kingston mean daily flow (m3 s−1) POT series.

[67] A GPD was fitted to each of the POT series built with the two identified thresholds (124 and inline image m3 s−1). Figure 8 shows the inline image confidence intervals for the quantiles that were obtained with these GPDs. The fit obtained using the 124 m3 s−1 threshold is good, even for the exceptional values of the mean daily flow: for high-return periods, the GPD tracks the data trend well, and only the data for the floods of 1894 and 1947 fall in the upper limit of the confidence interval. In contrast, these two peaks fall well into the inline image confidence intervals of the GPD adjusted with the 220 m3 s−1 threshold.

Figure 8.

Thames at Kingston mean daily flow (m3 s−1) POT series and inline image confidence intervals estimated using u = 124 m3 s−1 and u = 220 m3 s−1.

[68] According to Marsh et al. [2005] the 1947 flooding was produced by an extraordinary snowmelt process, while the value of the 1894 flooding that is given in the series was revised and estimated to be around 800 m3 s−1 (this is not a gauged value but an estimated value). To complement the analysis, we studied the behavior of the GM when the two years of exceptional floods (1894 and 1947) were removed from the data series, the former under the assumption that it belongs to a different population (snowmelt generated), and the latter because is not gauged but estimated. Readers should bear in mind that this is just an exploratory analysis and that the authors are not recommending excluding these two data from the series.

[69] Obtained MRLP and inline image plot are included in Figure 7 with dashed lines. The new MRLP still has two trends, but in this case both are qualitatively similar and both correspond to a GPD with a light tail, and the selection of threshold lower than 220 m3 s−1 is not ruled out, as with the original plots. This analysis may indicate that, with regards to the selection of the threshold required to apply the POT method, the MRLP is more sensitive to the presence of outliers than the LNGPD model. However, deeper research is required before achieving a definitive conclusion.

[70] In summary, the LNGPD mixture model improves the data fit for the entire range of values for the variable of mean daily flow recorded at Thames at Kingston. At the same time, the model allows for the identification of the threshold necessary in order to apply the POT method. In this case, the threshold obtained is considerably lower than that obtained with the GM, and the GPD obtained for the POT series has a light tail, which is consider to be consistent with the degree of regulation to which the river is subject. However, ultimately the decision on which threshold should be used in extreme value analysis will depend on user criteria.

5.3. Orgiva Streamflow and Precipitation Series

[71] Lastly, we analyzed two shorter duration data sets corresponding to a Mediterranean basin located on the Iberian Peninsula. Daily precipitation and mean daily flow data series from Orgiva, Spain (coordinates inline image, inline image) were used. A description of the characteristics of this basin can be found by Herrero et al. [2009], Millares et al. [2009], and Mans et al. [2011].

5.3.1. Precipitation

[72] A series of 16,948 data points of daily precipitation from 1961 to 2008 were used. This series has inline image missing data and inline image zero data.

[73] Rainfall in Orgiva has a bimodal distribution, with a first peak corresponding to very low-intensity daily precipitation values (0.1 mm). An approximation for this type of distribution proposed by Carreau et al. [2009] consists of using a hybrid Pareto mixture model, composed of several normal-Pareto mixture distributions similar to model (2). However, here we attempted to demonstrate the ability of the LNGPD mixture model to explore and model this series of hydrological variables while avoiding the use of a more complicated model such as that presented by Carreau et al. [2009]. As a result, records below 0.3 mm d−1 were discarded for this analysis.

[74] Among the WB, LN, and gamma distributions adjusted to the data greater than 0.3 mm d−1, the LN distribution provides the best fit. Nevertheless, this distribution does not have a good fit in the upper tail for values greater than 30 mm d−1. Therefore, the LNGPD distribution was adjusted to obtain a significant improvement in fit (AIC reduced from 14,540 with the LN to 14,514 with the LNGPD distribution), obtaining an estimation for the upper threshold of inline image mm d−1. Figure 9 shows the QQ plot for both LNGPD along with the LN.

Figure 9.

QQ plot of daily precipitation values (mm d−1) in Orgiva. LN (red asterisks) and LNGPD (green circles).

[75] When analyzing extreme values through the POT method, the lag one autocorrelation and the Poisson hypothesis was checked for several thresholds and Tmin. Obtained results show that with inline image the POT series obtained with thresholds between 10 and 50 mm d−1 are not autocorrelated and meets the Poisson hypothesis.

[76] Figure 10 shows the plots of the GM for the POT series, from which a threshold of u = 48 mm d−1 is selected. Since with this threshold inline image yr−1 is obtained, a lower threshold at u = 26 mm d−1 (where the MRLP slightly changes its slope) is also selected. With this latter threshold inline image yr−1 is obtained, while inline image mm d−1 results in inline image  yr−1.

Figure 10.

MRLP of the daily precipitation POT series for Orgiva.

[77] Figure 11 shows inline image confidence intervals obtained with the three thresholds. It is noted that the GPDs obtained with thresholds inline image mm d−1 and inline image mm d−1 results in very similar quantiles. However, the confidence intervals obtained with the latter are considerably wider than those obtained with the former.

Figure 11.

Orgiva daily precipitation (mm d−1) POT series and inline image confidence intervals estimated using inline image mm d−1, inline image mm d−1, and inline image mm d−1.

5.3.2. Streamflow

[78] For the mean daily streamflow data series, 5546 data points were used, corresponding to the period from 1991 to 2009, with several periods of missing data ( inline image of missing data). In total, there were 15 years of data. Here, no particular action was taken in order to deal with the missing data.

[79] There are two flow populations in the Orgiva basin: one from snowmelt and the other from surface runoff. Both populations can be differentiated if statistical analysis is complemented with physically based models (see, e.g., Herrero et al. [2009] and Millares et al. [2009] for a description of the latter). However, limiting ourselves to a purely statistical analysis, it was found that both populations overlap and follow a LN distribution, although this distribution fits the tails poorly (see Figure 12).

Figure 12.

QQ plot of the mean daily streamflow (m3 s−1) in Orgiva. LN (red asterisks) and LNGPD (green circles).

[80] The LNGPD model is fitted to the data imposing the lower threshold to be zero since (as with the Fort Collins series) the estimated lower threshold was lower than the minimum of the data. Figure 12 present the QQ graph, where the fit obtained with the LNGPD model is significantly improved compared to that obtained with the LN distribution (AIC reduced from 23,353 to 23,328).

[81] As for applying the POT method, a inline image is required in order to assure that the POT series does not show autocorrelation and meets the Poisson hypothesis. The MRLP (Figure 13) indicates that the value of the threshold appropriate for applying the GPD distribution to the POT series is about 8 m3 s−1, similar to the threshold estimated with the LNGPD model inline image m3 s−1. We concluded that, for this particular data series, the LNGPD provides the same threshold that is obtained with the GM, so for the sake of brevity the results of fitting of the GPD to the POT series are not included.

Figure 13.

MRLP of the mean daily streamflow POT series in Orgiva.

[82] In summary, when applying the LNGPD model to the series of daily precipitation and mean daily streamflow data recorded in Orgiva, one finds that it provides a good fit for the full range of values of both variables, except for values less than inline image mm d−1 in the precipitation series, and that in both cases a suitable upper threshold to apply the POT method can be identified.

6. Conclusions

[83] This paper explored the use of a mixture model (LNGPD) for the marginal distribution of hydrological variables. This distribution comprises a truncated central distribution that is representative of the central regime, which was the LN distribution for the cases analyzed, and two GPDs for the upper and lower tails, to represent the maxima and minima regimes, respectively.

[84] The LNGPD model is able to work over the entire range of values of some significant hydrological variables, such as precipitation and streamflow, regarding the data records as coming from three different populations. The thresholds are model parameters and are estimated by ML. Consequently, the threshold calculation is automatic and objective, it does not require the predefinition of any parameter; and it yields the minima, central, and maxima regimes of the hydrological variables by determining the thresholds u1 and u2, and their uncertainty.

[85] A significant advantage of this model and methodology over existing ones is that the calculated u2 may be used as the threshold value required for the calculation of high-return period quantiles with the POT method. To fully exploit the advantages of the model it would be interesting to define a methodology to include the uncertainty of the threshold in the estimation of the confidence intervals of high return period quantiles. This is the subject of ongoing research.

[86] The use of u1 for studying the minima regime through peaks below the threshold method may also be possible. However, a specific research on this topic must be conducted before one can affirm this since the characteristics of the lower tail (droughts) are different than those of the upper tail (storms/floods).

[87] The proposed mixture model was tested through a simulation study and then applied to four series of hydrological data: two of mean daily flow, the Thames at Kingston (UK) and the Guadalfeo River at Orgiva (Spain), and two of daily precipitation, Fort Collins (CO, USA) and Orgiva (Spain).

[88] The simulation study allowed establishing the strong points as well as the limitations of the model. The main advantage of the proposed methodology is its robustness in regards to the estimation of high return period quantiles; even when having bias in the estimation of the threshold, the bias and the RMSE obtained when used the estimated threshold for the estimation of the quantiles is similar to the bias and the RMSE obtained when the known threshold is used. Model application, however, is limited to data series having “enough” exceedances over the threshold. It was observed that, for series of 25 years of daily data with a threshold equal to inline image quantile and with strong autocorrelation, the model failed to give an appropriate estimation of the high-return period quantiles in as much as inline image of the time.

[89] In regards with application to real data series, the LNGPD mixture model improved the fit of the data series relative to the fit obtained with the LN distribution; in particular, it provided a good fit in the upper tail. Also, in the four cases studied the thresholds obtained from the LNGPD models were suitable to apply the POT method. For some series this threshold differs from the one obtained through the GM. The threshold obtained with the LNGPD model tends to be lower than the one obtained with the GM. In this sense, the user should keep in mind that there is a trade-off between bias and variance when choosing the threshold.

[90] Finally, by estimating the threshold with the proposed method, the user must choose only the minimum interevent time in order to construct a POT series. Thus, some objective criterion could be selected for estimating minimum interevent time, such as minimizing the deviance of the dispersion coefficient from one, which may allow to perform a totally objective construction of the POT series.

6.1. Issues to Explore Further

[91] In first place, it would be interesting to perform a simulation study to analyze how sensitive are the results to misspecification of the central distribution, and in particular to quantify the sensitivity of the lower and upper thresholds.

[92] Second, some questions may arise on how to use this model in order to deal with the seasonality, pluriannual cycles, and trends observed in the data. Beyond the traditional approach of dividing the data series in stationary sets, one possibility is to include nonstationarity into the model. This latter approach is partially explored by Solari and Losada [2011] and Solari and van Gelder [2011].

Appendix A:: ML Estimation

A1. Uniqueness of the Solution

[93] The proposed LNGPD model has five parameters. This is a greater number of parameters than that which is commonly estimated by means of ML during the application of a parametric model (e.g., lognormal and gamma distributions have two parameters, and the generalized extreme value distribution has three parameters).

[94] The parameter-fitting method was analyzed to find whether it effectively maximizes the likelihood function or not. This was done as follows. First, a set of u1 and u2 thresholds ( inline image) covering the entire range of values that the variable assumes, was defined. Then, for each pair of values ( inline image), the other three parameters ( inline image) of the model were estimated by ML. Finally, the iso-probability curves were built in the inline image plane, and the ( inline image) point of ML was identified.

[95] Figure A1 presents the iso-probability curves for the Thames data series. Note that the surface has a unique maximum that corresponds to the values of the thresholds identified in section 5.

Figure A1.

Log likelihood function (LLF) as a function of the u1 and u2 thresholds for the Thames at Kingston data series.

A2. Optimization Method

[96] The NLLF was minimized using the BFGS method (quasi-Newton method [see, e.g., Nocedal and Wright, 2006, ch. 6]) implemented in the MATLAB® optimization toolbox.

[97] Commonly measured or hindcast data were truncated with a given precision. This produced data that cluster at specific values. It was observed that this led to relative minimums and maximums in log likelihood function, which may lead to problems when using optimization algorithms for minimizing the NLLF. An alternative for solving this problem is to use more complex optimization procedures (e.g., global optimization procedures). We opted to uniformly distribute the data in the intervals defined by the precision. This solution is preferable for two reasons: it is easier to implement and since the confidence intervals depend on the curvature of the LLF at the optimum point through the information matrix, it is more realistic to use uniformly distributed data, as explained below.

[98] Figure A2 shows the LLF for the Thames data series. As shown, the use of distributed data results in an important smoothing of the likelihood function. This avoids the singular values observed when using the original data. Furthermore, it is clear that the values obtained when using the original data produce a fictitious curvature that may influence the calculation of the confidence intervals.

Figure A2.

Log likelihood function (LLF) as a function of the u1 and u2 thresholds, for the original data (top) and for the uniformly distributed data (down). Thames at Kingston mean daily flow data.

Acknowledgments

[99] The authors acknowledge Caroline Keef and two other anonymous reviewers whose comments certainly helped to improve this paper both in its content and its readability. This research was funded by the Spanish Ministry of Education through its postgraduate fellowship program, grant AP2009-03235. Partial funding was also received from the Spanish Ministry of Science and Innovation (research project CTM2009-10520), the Spanish Ministry of Public Works (projects CIT-460000-2009-21 and 53/08 –orden FOM/3864/2008–), and the Andalusian Regional Government (research project P09-TEP-4630).