Corresponding author: M. Roth, EURANDOM, Eindhoven University of Technology, PO Box 513, Eindhoven, 5600 MB, Netherlands. (firstname.lastname@example.org)
 Regional frequency analysis is often used to reduce the uncertainty in the estimation of distribution parameters and quantiles. In this paper a regional peaks-over-threshold model is introduced that can be used to analyze precipitation extremes in a changing climate. We use a temporally varying threshold, which is determined by quantile regression for each site separately. The marginal distributions of the excesses are described by generalized Pareto distributions (GPD). The parameters of these distributions may vary over time and their spatial variation is modeled by the index flood (IF) approach. We consider different models for the temporal dependence of the GPD parameters. Parameter estimation is based on the framework of composite likelihood. Composite likelihood ratio tests that account for spatial dependence are used to test the significance of temporal trends in the model parameters and to test the IF assumption. We apply the method to gridded, observed daily precipitation data from the Netherlands for the winter season. A general increase of the threshold is observed, especially along the west coast and northern parts of the country. Moreover, there is no indication that the ratio between the GPD scale parameter and the threshold has changed over time, which implies that the scale parameter increases by the same percentage as the threshold. These positive trends lead to an increase of rare extremes of on average 22% over the country during the observed period.
 Design values for infrastructure are often based on characteristics of extreme precipitation. These characteristics may have changed over time owing to climate change, see, e.g., Klein Tank and Können  and Milly et al. , which contradicts the stationarity assumption that is usually made in hydrologic and hydraulic design. Wrongly assuming stationarity generally leads to systematic errors in design values and might have a considerable impact on the risk of failure of hydraulic structures, as shown by Wigley . Climate scientists have analyzed trends in moderate extremes that occur once or several times per year, based on annual indices. Examples are the empirical annual 90% quantile of the precipitation amounts on wet days or the 1 day or 5 day maximum precipitation amount in each year, see, e.g., Klein Tank and Können  and Turco and Llasat .
 In this study we focus on rare extremes which occur less frequently than once per year. These are often assessed by extreme value (EV) models that are fitted to block maxima (BM), e.g., the largest value in a year or season. Considering only BM discards useful data in the case of multiple (independent) extremes in a block, see, e.g., Madsen et al. [1997a], Lang et al. , and Kyselý et al. . We follow a common alternative method to analyze extremes by considering all values that exceed a certain high threshold, which is known as peaks-over-threshold (POT) modeling, see, e.g.,Coles . A potential advantage of POT modeling is the possibility to include more data in the analysis than in the BM approach, which may reduce the estimation variance. A reduction of the estimation variance may also be achieved by analyzing the r-largest maxima in a block, seeColes  and Zhang et al. . However, this method has the potential drawback of using values that may not represent extreme values, e.g., in a dry year.
 Because of the rarity of the extremes, the parameters in the EV models and in particular large quantiles of the precipitation amounts have wide confidence intervals. To reduce the uncertainty in the estimates the use of data sets over a long period and/or regional frequency analysis (RFA), have been recommended [e.g., Hosking and Wallis, 1997]. Long time series are rare and sometimes not available for a certain region. However, often relatively short records are available for many sites in the region. The idea behind RFA is to exploit the similarities between the sites in that region, so that all data in the region can be used to obtain quantile estimates for a particular site. RFA approaches to estimate properties of extremes in a stationary climate have been used quite often with BM data, see for an overview Cunnane , Hosking and Wallis , and more recently Svensson and Jones , but rarely with POT data [Madsen et al., 1997b]. For nonstationary extremes only very few studies considered an RFA approach, among them Westra and Sisson , who use a max-stable process model for BM, andHanel et al. who apply an index flood (IF) approach also to BM data. The IF approach is a popular method in RFA. It assumes that the distributions of the extreme precipitation amounts are identical after scaling with a site-specific factor (the index variable). We mimic the approach byHanel et al. and develop a POT model, with time-varying parameters, that satisfies the IF assumption. The threshold varies linearly over time and is determined for each site separately. The distribution of the excesses of the threshold is modeled by the generalized Pareto distribution (GPD). Its parameters may vary linearly over time and their spatial variation is determined by the IF assumption. We apply this model in a case study to gridded observed daily precipitation data for the winter period over the Netherlands.
 In section 2 we describe the proposed model. We explain the basic methods used to deal with high quantile estimation in the case of stationary data with emphasis on the POT approach. After that, we present our model for the nonstationary climate. In section 3 we outline the estimation procedure. The choice between different models is addressed in section 4 and in section 5 the application of the model to observed daily precipitation data in the Netherlands is discussed.
2. Model Description
 The data we describe with our model consist of measurements at S sites over a period of T time points. The data can be represented in an space-time matrix
where is the value at site s and time t, and .
 In POT modeling exceedances over a high threshold are considered, , . This threshold is generally site specific and may depend on time. In the case of temporal clustering of the exceedances the largest value in a cluster (peak) is considered only. These peaks will then generally be approximately independent. We assume that the have been declustered and we define as the difference between the daily value at site s and time t and the corresponding value of the threshold, i.e.,
and y is defined analogously to x. The excesses are the nonnegative part of y. Note that owing to the declustering is only non-negative if there is a peak. By we denote the subset of days where at least one threshold excess occurs, i.e.,
2.1. Stationary Climate
2.1.1. Site Specific Approach
 The BM approach for a stationary climate relies on the Fisher-Tippet-Gnedenko theorem for maxima of independent and identically distributed (i.i.d.) random variables. This theorem allows, under certain regularity conditions, to approximate the distribution of the BM by an extreme value distribution, see, e.g.,Embrechts et al. . The three types of extreme value distributions can be summarized in the generalized extreme value (GEV) family with distribution function
for , where , , and are the location, scale, and shape parameter. corresponds to the Fréchet family, to the Weibull family and to the Gumbel family.
 When we consider the POT approach rather than block maxima, we have to model the process of exceedance times and the distribution of the excesses separately. In a stationary climate the threshold u is constant and the times of exceedance are usually modeled by a homogeneous Poisson process. This implies that the mean number λ of exceedances in a block (e.g., year or a particular season) is constant over time.
 The Balkema-de Haan-Pickands theorem states that the distribution of i.i.d. excesses can be approximated by a generalized Pareto distribution, if the thresholdu is sufficiently high and certain regularity conditions hold, see, e.g., Reiss and Thomas :
for if and if , where σ and ξ are the scale and the shape parameter. For the GPD reduces to the exponential distribution.
 We are interested in the level which is exceeded on average α times in a block ( ). Since there are on average λ peaks in a block, the probability that an arbitrary peak exceeds equals , i.e., the excess level is not exceeded with probability . Therefore, to obtain we add the threshold to the quantile of the excess distribution:
We will sometimes indicate the quantile as the m-year return level to make the comparison with studies for a stationary climate easier. For seasonal data a return period ofm years means that we expect on average excesses per year in the specific season.
 If one assumes that the exceedance times originate from a homogeneous Poisson process and the excesses are independent and follow a GPD, it can be shown that the following relationship between the parameters of the GEV and the GPD holds [Buishand, 1989; Wang, 1991; Madsen et al., 1997b]:
Note that the derived GEV distribution is defined only for BM greater than u.
where Ms represents a typical block maximum at site s, is the index variable at site s for , and the common distribution function φ does not depend on the site s. From equation (3)we see that the site-specific quantile function can be written in the following product form:
where is the quantile function of Ms and τ is the nonexceedance probability.
 Because of using more data than those from the site of interest alone, the IF can provide quantile estimates, which are superior to at-site estimates, even if spatial homogeneity is not entirely achieved after scaling [Cunnane, 1988]. The IF approach was developed for river discharges but can be applied whenever multiple samples of similar data are available, see Hosking and Wallis . In particular, for precipitation data the IF assumption has often been used in combination with the GEV family, see, e.g., Hosking and Wallis , Fowler et al. , and Hanel et al. . To further enhance the usage of the available data, Madsen and Rosbjerg  propose the combination of the IF assumption with the POT approach.
 A natural analog of relation (3) in the POT setting is that the site-specific exceedances, properly scaled by their index variables, have a common distribution. More formally:
where the random variable Xs represents the values at site s, is the site-dependent scaling factor (index variable), andψ does not depend on site s. Note that because of , and because ψ has a density with mass immediately to the right of , it follows that has to be the lower endpoint of the support of ψ for every , i.e.,
This can be only true if the index variable is a multiple of the threshold, i.e., for some positive constant c. Without loss of generality we can take . This choice of also satisfies the IF equation for the excesses, i.e.,
where and is independent of site s.
 A natural choice for a site-specific threshold is a high empirical quantile of the at-site data [see alsoJ. A. Smith, 1989]. An important consequence of this choice is that the mean number of exceedances per block will be approximately constant over the region, i.e.,
 Under the previous assumptions the distribution of the scaled excesses has the following form:
Equation (7) then implies that we have the following restrictions on the parameters of the GPD:
i.e., a common dispersion coefficient γ and a common shape parameter ξ.
 We would like to obtain an IF model in the BM setting if we transfer the parameters from the IF model in the POT setting, using relationship (2). If the block maxima follow a GEV distribution, it can be shown that the IF assumption is satisfied if the dispersion coefficient and the shape parameter of the GEV distribution are constant over the region, see, e.g., Hanel et al. , i.e.,
If we transform the conditions (9) according to relationship (2) and use that λ is constant over the region, we obtain the following conditions on the GEV distribution parameters:
That is the conditions in (10) are fulfilled.
 Summarizing we have developed an IF model with only one spatially varying parameter, the threshold us and the other parameters ξ, γ, λ constant over the region. Note that we choose λto be constant in the first place and therefore obtain a site-specific threshold. This is different from the model proposed byMadsen and Rosbjerg , where us is a priori fixed and only the shape parameter ξ is constant over the region, whereas σ and λ vary over the region, which violates relationship (2). Moreover, their model is only an IF model for the excesses, whereas our model is an IF model for both the exceedances and the excesses, equations (5) and (7), respectively.
 We get the following GPD model for the excesses:
Now we can rewrite equation (1) for the -year return level at sites as
As in equation (4), we see the factorization in a site-specific index variable and a site independent common quantile function.
2.2. Nonstationary Climate
 There is no general theory for the distribution of extremes of nonstationary data. Approaches to account for long term trends in extremes are usually pragmatic extensions of the extreme value models for stationary data [Coles, 2001]. The classical way to incorporate this nonstationarity in the POT approach is to keep the threshold constant and model the changing exceedance frequency by an inhomogeneous Poisson process and the excesses by a GPD with time-dependent parameters [R. L. Smith, 1989; Coles, 2001; Yiou et al., 2006; Bengtsson and Nilsson, 2007].
 We follow a different route by considering a time-dependent threshold, see, e.g.,Yee and Stephenson , Coelho et al. , and Kyselý et al. , such that the process of exceedances can be approximated by a homogeneous Poisson process. We evaluate this approximation by a number of tests (see section 5). A natural way to determine the time varying threshold is quantile regression, which can be described as a way to identify the temporal evolution of a given quantile in a smooth parametric way, see, e.g., Koenker , Friederichs , and Kyselý et al. . Quantile regression is further discussed in section 3.1. When we take a time-dependent high quantile, estimated by quantile regression, instead of a constant quantile, we can assume thatλis constant over space and time. The time-dependent GPD is used to describe the excesses of the time varying threshold.
Hanel et al. generalize the IF assumption to the nonstationary block maxima setting. Following them we generalize (5) in a similar way, which means that after scaling by a time-dependent index variable for every time point the site-specific distribution functions are constant over the region, i.e., :
where is independent of the site s. As in the stationary case we take the threshold as the index variable Now we can generalize (9) in view of (15) to
and equation (14) can be generalized to the nonstationary setting:
As in the stationary case, we can see the factorization into a time and site dependent index variable and a quantile function, which depends on time only.
3. Estimation of the Model Parameters
 We have chosen the threshold as a time-dependent high quantile. For the estimation of this quantile we use quantile regression, which is outlined insection 3.1. Section 3.2illustrates the composite likelihood framework for estimating the time-dependent parameters of the excess distribution.
3.1. Threshold Estimation
 Quantile regression relies on the fact that a sample quantile can be viewed as a solution of an optimization problem, which can be computed efficiently using linear programming, as shown by Koenker and Bassett . For a fixed site we can obtain the τth sample quantile of the observations as
In linear quantile regression it is assumed that the τ th conditional quantile function for given covariates z has a linear structure, i.e.,
as estimator for . For details of the transformation of this optimization problem into a linear program see Koenker .
 Note that the threshold is determined for each site separately and that given the linear quantile function (19) holds, we have the following relationship between the mean number of exceedances per block λ and τ:
where is the number of blocks.
3.2. Excess Distribution Estimation
 Maximum likelihood (ML) estimation is a common approach to estimate the parameters in a statistical model. The ML framework has attractive asymptotic properties. Moreover, it is very flexible, e.g., it is convenient to incorporate covariates. For these reasons several authors recommend it for the estimation of extreme quantiles, especially when trends occur, see, e.g., Coles .
 To estimate the regional parameters γ and ξ of the excess distribution all peaks across the study area are considered simultaneously. For the application of the ML method, the full likelihood function, over all times and sites, is needed. This function is difficult to describe, owing to the spatial dependence and large dimensionality. Moreover, maximization of this function would be virtually impossible. However, if one is interested in the marginal distributions only rather than multivariate extremes, a simplified likelihood for the estimation of the parameters may be used, but standard errors and test procedures have to be adjusted for spatial dependence. In RFA approaches the parameters have sometimes been estimated by the so called independence likelihood, i.e., the likelihood is maximized under the artificial working assumption of spatial independence, see, e.g., Moore , J. A. Smith , Buishand , Cooley et al. , and Hanel et al. . Though this method provides asymptotically unbiased parameter estimates, the spatial dependence leads to an increase of the variance of the estimates compared to the independent case. Especially in the earlier papers the adjustment of the error estimation for spatial dependence was not made. R. L. Smith (Regional estimation from spatially dependent data, unpublished manuscript, 1990) is probably the first reference where standard errors and likelihood ratio tests were adjusted for spatial dependence in an RFA approach. His approach is a special case of the composite likelihood method, see Varin et al.  for an extensive overview. Recently it has been applied by Blanchet and Lehning  to annual maximum snow depths over Switzerland and by Van de Vyver  to annual extremes of precipitation in Belgium.
 In the nonstationary IF model, the parameters γ and ξ of the excess distribution depend on time. We postulate a certain structure for these parameters, e.g., where is the mean of the time points, so that is the average of over t. Let be the vector of parameters that has to be estimated. The independence likelihood is then given by
where the condition on reflects that we only consider peaks over the threshold. Note that by the choice of the quantile, the threshold has been fixed beforehand.
 The maximum independence likelihood estimator (MILE) is the parameter which maximizes or equivalently the independence log likelihood
We have to optimize this function with respect to the elements of θ. This can be done using the Broyden-Fletcher-Goldfarb-Shanno (BFGS) method as implemented in the optim-function of GNU R [R Development Core Team, 2011].
 For testing the adequacy of the IF model, it is necessary to consider models with a spatially dependent dispersion coefficient, e.g., and . The independence log likelihood for this model is obtained by replacing by and by ξ in equation (20). The direct optimization of this likelihood with respect to the parameters is in the case of a large number of sites computationally very demanding. Therefore we exploit the structure of the independence likelihood by using a profile likelihood approach. In the example above we can split, for a given shape parameter, the optimization over an S-dimensional space intoS optimization problems in one dimension, i.e., the maximization of the log likelihood for the excesses at site s with respect to . This is usually much faster. If one does this on a grid of potential values for the shape parameter one can see the structure of the profile likelihood. Moreover, we can construct a convergent procedure, leading to the estimator for the shape parameter. We recommend as initial value for this procedure the mean of the estimated shape parameters of a site-specific model. Another problem with the direct optimization might be the existence of local maxima in the likelihood; with the proposed approach we did not experience any problems with this issue.
where is the number of days with one or more threshold exceedances and is the Godambe information:
where is minus the expected Hessian of at θ, also referred to as sensitivity matrix, and is the variability matrix, i.e., the covariance matrix of the score . In the case of spatial independence, we have and the Godambe information reduces to the Fisher information, i.e., . Here is estimated as its observed value at , and J as
where and is the contribution of day t to . The latter estimate makes use of the fact that the excesses on different days are independent, see, e.g., Chandler and Bate , Varin et al. , and R. L. Smith (Regional estimation from spatially dependent data, unpublished manuscript, 1990). An estimate of the Godambe information is obtained by plugging in the estimates and in equation (21). This estimate is used to assess the uncertainty of the parameters (and quantiles) of the excess distribution, see section 4.
4. Model Selection for the Excess Distribution
 In this section we describe the methods used to investigate the temporal behavior of the dispersion coefficient and the shape parameter as well as the adequacy of the IF model.
 Information criteria are used as an indication of the suitability of a specific model. Varin et al.  present composite likelihood adaptations of the Akaike information criterion (AIC) and the Bayesian information criterion (BIC), which are defined in the usual way
where is an effective number of parameters:
 Moreover, we will test our assumptions using nested models. This means that we consider subsets M0 of the full model M1 by constraining q components of the parameter vector θ. For instance, we may partition such that the q-dimensional componentψ is zero under M0. To test this hypothesis, we use the independent likelihood ratio statistic, which is a special case of a composite likelihood ratio (CLR) statistic [Chandler and Bate, 2007; Varin et al., 2011]:
where ( ) denotes the MILE under model M1 (M0). Varin et al.  present the following asymptotic result for W under the null hypothesis
where the Zj are independent, standard normal variates and are the eigenvalues of
Here denotes the submatrix of the inverse Godambe information for the full model M1 pertaining to the parameter vector ψ and is defined analogously.
 In order to obtain the information criteria and the asymptotic distribution of W under the null hypothesis, we need to estimate the Godambe information, which is difficult when the number of parameters is large. Hence it is not feasible to examine the appropriateness of the IF assumption for regions with many sites, based on the Godambe information.
 One possibility to obtain p values for the test statistic W, without estimating the Godambe information, is to apply a bootstrap procedure, see, e.g., Varin et al. . We follow Hanel et al.  and use a semiparametric bootstrap approach to take the dependence structure into account, without explicitly modeling this. The challenge is to produce bootstrap samples according to the null hypothesis, which exhibit approximately the same spatial dependence structure as the original data set. We assume that the underlying spatial dependence is not changing over time, i.e., only the marginal distributions are changing. One could think of a constant copula generating the dependence structure i.e. for fixed t
where , and C is a copula, for details on copulas see, e.g., Nelsen . We generate the bootstrap samples in three steps. In the first step we transform the sample of the excesses into a sample that follows approximately the standard exponential distribution
where and are the estimated scale and shape parameters under the full model M1. In the second step we sample with replacement from monthly blocks of the whole spatial domain to obtain a new sample with approximately standard exponential margins and the same spatial dependence structure as that of . In the third step we use the estimated scale and shape parameter under the null hypothesis, denoted as and , respectively, to transform the sample to a bootstrap sample of the excesses
The follow approximately the GPD model M0 and mimic the spatial dependence structure of the original excesses.
 From a number of Monte Carlo experiments, Kyselý [2007, 2009] concluded that the (nonparametric) bootstrap generally resulted in too narrow confidence intervals for large quantiles of the distributions that are commonly used to describe the distribution of precipitation extremes. This has been attributed to the skewness of the estimators of the model parameters in the case of small and moderate sample sizes. This objection might be weakened when using RFA methods, because then the estimation is based on much more data.
5. Application to Precipitation Data
 We applied the regional peaks-over-threshold method to observed precipitation data from the Netherlands. We used the daily, gridded E-OBS data (version 5.0), which were made available by the European funded project ENSEMBLES [Haylock et al., 2008]. We consider winter (DJF) precipitation for 25 km × 25 km grid squares centered in the Netherlands for the period 1 December 1950 to 28 February 2010. In total we have 69 grid boxes and 60 winter seasons of daily measurements for each grid box.
 Netherlands has a maritime climate with relatively mild and humid winters. Figure 1 shows the mean over the considered period of the largest daily precipitation value in winter (winter maximum) for each grid box. The spatial variation in Figure 1 is small, 80% of the values lie between 18.2 and 20.4 mm. Previous studies propose to view the Netherlands as a homogeneous region to which the IF assumption applies, see, e.g., Overeem et al.  and Hanel et al. .
5.1. Event Selection and Trend in the Threshold
 Daily precipitation in the winter season exhibits some temporal dependence, also at high levels. The relation between the GEV and GPD parameters (equation (2)) relies on the independence assumption as does the estimation of the variability matrix J, therefore, it is necessary to select a subset of independent events. This is usually achieved by specifying a minimum separation time tsep between exceedances over the threshold. The separation time is determined by the temporal dependence in the data at high levels. This temporal dependence is weak for daily precipitation and separation times of 1 or 2 days seem to be sufficient, see, e.g., Coles  and Kyselý and Beranová . In the present study a separation time of one day was chosen and the original data were declustered rather than the exceedances. For every and we replace by zero, if it is not a local maximum, i.e., if or is larger than . For example we obtain from the sequence the new sequence but the sequence remains unchanged. It is clear that also the excesses obtained from the declustered data are separated at least by 1 day. However, while the method handles temporal dependence for each site separately, it does not handle situations where the peak at grid box A occurs a day later than on site B. In most cases these will be two separate rainfall events. The main advantage of this declustering algorithm is that the expected number of exceedances per block λ will be approximately constant, which is a basic assumption of our model. This follows since we determine the threshold for the new data via quantile regression, as described in section 3.1.
 We choose the threshold to be the 96% linear regression quantile of the declustered data. Hence, we expect on average 3.61 exceedances per grid box and winter season. The actual sample size varies between 216 and 218 exceedances per grid box, which corresponds to 3.6–3.63 exceedances per grid box and season (the small differences are caused by ties in the daily precipitation amounts). In total we observe 777 days with at least one exceedance. In Figure 2 we show the cumulative sum of the number of exceedances for the successive winter seasons for the grid box around De Bilt with 95% pointwise tolerance intervals for a homogeneous Poisson process, with intensity and conditioned on the total number of exceedances (see Lang et al.  for the derivation of the tolerance interval). The figure is typical for the whole region and indicates that a homogeneous Poisson process might be appropriate to describe the occurrence times of the exceedances.
 Moreover, we test the hypothesis that the number of exceedances in a winter season follow a Poisson distribution for each grid box individually. The dispersion index (DI) test exploits the fact that the variance and the mean of the Poisson distribution are the same, see Cunnane for details. The ratio of the variance and the mean is sensitive to the separation time between exceedances: The ratio tends to be larger than one if the separation time is too short. The Poisson assumption is rejected at the 5% significance level in two of the 69 grid boxes, which is in good agreement with the expected number of rejected grid boxes under the Poisson assumption. If the exceedance times come from a homogeneous Poisson process, these should be distributed uniformly on any time interval, conditioned on the number of exceedances, which can be tested by the Kolmogorov-Smirnov test, see, e.g.,Cox and Lewis . This test does not reject uniformity in any grid box. The pvalues of the Kolmogorov-Smirnov test on uniformity of the event times are shown inFigure 3.
Figure 4 shows for each grid box the mean of the threshold for the 1950–2010 period. The trend in the threshold for the 1950–2010 period is positive over the whole domain, see Figure 5, but is relatively small in the southeastern part of the country and large (up to 0.68 mm per decade) in the west and northern parts, where it is significant at the 5% level.
 In Figure 6 one can observe that the temporal evolution of the thresholds is similar for each of the selected grid boxes. The findings are consistent with Buishand et al.  who found a significant positive trend in the mean precipitation for the winter half year (October–March) in the Netherlands during the period 1951–2009, but did not detect a spatial gradient in the trend of mean winter precipitation.
5.2. Threshold Excesses
 We consider four different models for the excess distribution, three based on the IF approach, A–A″ in Table 1, and one with a spatially varying dispersion coefficient and constant shape parameter, model B.
Table 1. Overview of Models Used
 In a first step we want to infer which of the IF models is the best to describe the data. For a first indication the information criteria are computed, as outlined in section 4, for each of the three IF models, see Table 2. We see from both the composite AIC and the composite BIC that the incorporation of a trend in the dispersion coefficient γ (model A′) does not result in a better model. According to the AIC model A″, which has a (linear) trend in the shape parameter, is selected. This contrasts with the selection of the simplest model A by the BIC. The effective number of parameters is much larger than in the traditional AIC or BIC criteria, i.e., for model A we observe that equals 70.5 instead of two for the independent case and for model A′ and A″ we obtain values for of 95.5 and 89.3, respectively, instead of three. This large difference is owing to the strong spatial dependence in the data.
The lowest AIC and BIC values are printed in bold.
 The shape parameter is crucial for the estimation of very high quantiles. Model A estimates the shape parameter to be 0.03, i.e., just in the Fréchet domain. Model A″ estimates a large drop in the shape parameter from 0.10 to −0.09, which would mean a change from the Fréchet family to the Weibull family. In order to gain more insight in the temporal behavior of the shape parameter, we compute the shape parameter for overlapping 20 year subsamples of the data, using model A, which has no trend in the model parameters. It appears that a large part of the negative trend in the shape parameter in model A″ is owing to one specific event, namely the extreme rainfall of 3 December 1960, compare also Buishand  and Van den Brink and Können , resulting in a large drop of the 20 year window estimates in the year 1971, as observed in Figure 7.
 The quantile estimates, obtained from model A, are increasing due to the positive trend in the threshold. Because in this model there are no trends in the dispersion coefficient and the shape parameter, the trend in the quantile is proportional to the trend in threshold. The average increase over the country is 22% for the entire period 1950–2010. In contrast to the previous model, the trends in the quantiles from model A″ depend on the return period. While the 2 year return level is still increasing due to the positive trend in the threshold, we have that the 25 year return level is decreasing due to the negative trend in the common shape parameter. The 5 year return level is approximately constant, see Figure 8. An interpretation of this is much more complex, than for the quantile estimates, stemming from model A.
 When we carry out the composite likelihood ratio test, it turns out that neither the trend in the dispersion coefficient nor the trend in the shape parameter are significant, although the p values are quite different for these trends, see Table 3. We can also see from Table 3 that the bootstrap procedure gives similar results as the use of the asymptotic result in equation (23). We want to stress that the regional approach is more likely to detect a trend if there is one. This can be deduced, e.g., by comparing the standard errors of the regional approach with those obtained for the same model in the at-site analysis. For model A′ the standard error of the dispersion trend estimate is 35% smaller and for model A″ the standard error of the shape trend estimate is 37% smaller than for the at-site approach.
Table 3. p Values of the CLR Test Against Model A (2500 Samples)
 In the second step we want to test the IF assumption. Therefore we compute the composite likelihood ratio test for the full model B and the nested model A. As earlier explained, we cannot estimate the Godambe information well for model B. Hence, we proceed only with the bootstrap procedure. We obtain a p value of 0.103 for 2500 bootstrap samples. This means that the IF assumption does not have to be rejected. Note, because of the large difference in the number of parameters between model B and model A, the composite likelihood ratio test will not have much power due to the great number of alternatives. This can be considered as an intrinsic problem, when comparing regional models with site dependent parameters.
 In the following we focus on model A, i.e., no trend in the dispersion coefficient and shape parameter. Before looking closer to return levels and their associated uncertainty, we investigate the adequacy of the model and the spatial dependence using the seasonal maxima of the standard exponential residuals defined by equation (24). The maximum for grid box s in season j is denoted as . From equation (2) it follows that the are approximately Gumbel distributed with location parameter and scale parameter 1, censored at zero. We have to censor the distribution to account for seasons without any exceedance. The empirical distribution of can be compared with the theoretical distribution in a Gumbel plot. Figure 9 shows the spatial mean of the empirical distributions. Apart from the outlier at the largest rank, which is due to the December 1960 event, there is a good correspondence between the averaged observed distribution and the postulated Gumbel distribution. Analogous to Dales and Reed , see also Reed and Stewart , the degree of spatial dependence is determined by comparing the distribution of the seasonal spatial maximum, i.e., the largest seasonal maximum over the region in a season
with the distribution of the seasonal maximum of an individual grid box. Based on this comparison an effective (spatial) sample size, i.e., number of independent grid boxes, can be computed, which is a measure of joint tail dependence. In the case of fully spatially dependent data, the follow the same Gumbel distribution as the maxima at an individual grid box. However, in the absence of spatial dependence the location parameter increases to . The empirical distribution of the , as shown in Figure 9, indicates that the data are neither fully spatially dependent nor independent.
 To determine the effective sample size Se we fit a Gumbel distribution to the , keeping the scale parameter fixed at 1. The location parameter of the fit is then equivalent to . Hence, we obtain , which results in an effective sample size of almost 4 for the data in Figure 9.
 The influence of the spatial dependence on the uncertainty of the parameters is, however, directly related to the variability matrix J in equation (21) and not to Se. Figure 10compares for a particular grid box the estimated return levels of the excess distribution based on the site-specific approach with those obtained from the RFA. Pointwise confidence bands for the return levels based on the delta method using the asymptotic normality of the MILE are also given. The quantile estimates for the two methods are quite similar, but the IF approach reduces the uncertainty in the estimation by 37.5%. We can see inFigure 10 that much tighter confidence bands are obtained if no adjustment for spatial dependence is made.
Figure 11visualizes for one grid box the temporal evolution of the threshold, based on at-site estimation, and the 25 year return level, based on the at-site estimation and the IF approach. Additionally, the figure shows confidence bands for the threshold and the 25 year return level. The confidence band for the threshold is based on the two-sided 95% confidence interval using a Huber sandwich estimate, as implemented in the quantreg package (optionnid) [Koenker, 2005]. The confidence band for the 25 year return level is computed by adding the lower (upper) limits of the 95% confidence band for the threshold to the lower (upper) limits of the 95% confidence band for the 25 year return level of the excesses (shown in Figure 10). This confidence band has a coverage probability of at least 90%, which can be seen by simple probability arguments. We can see that the uncertainty in the threshold is small compared to the uncertainty in the return level.
 An index flood approach for nonstationary peaks-over-threshold data has been developed. The threshold is chosen to be a large quantile that varies over time, which is also taken as the index variable. The peaks exceeding the threshold are described by generalized Pareto distributions. The index flood assumption implies that the ratio of the scale parameter to the threshold and the shape parameter are constant over the region but may vary over time. A consequence of this is that the ratio between different return levels is constant over the region.
 The approach was applied to gridded, observed daily precipitation data from the Netherlands for the winter season. A linear increase in the threshold was found, which is significant in the western and northern parts of the country. An apparent trend in the shape parameter was observed, which turned out to be mainly due to one exceptional event. This trend was not significant at the 10% level and was therefore disregarded. There was no evidence of a change in the ratio of the GPD scale parameter to the threshold, which means that the increase in the threshold is accompanied by an increase in the scale parameter. Therefore, we conclude that rare extremes are increasing proportionally to the increase in the threshold.
 Although the uncertainty in the estimation of the excess distribution was considerably reduced by the index flood approach, the remaining uncertainty is still substantial. This is owing to the spatial dependence. The uncertainty could be possibly further reduced by considering longer records or by extending the region. For instance, one could think of including the neighboring part of north Germany in the analysis. However, the presented model may not necessarily apply to larger regions, owing to the homogeneity constraints. In particular, the different trends in the index variable indicate that one should be very careful with extending the region. Moreover, one should keep in mind that the effective sample size may grow slowly with the size of the region owing to the spatial dependence.
 The validity of the bootstrap might be questionable and should be assessed by a Monte Carlo experiment, which includes the spatial dependence. However, this is for peaks-over-threshold data much more computationally demanding than for block maxima.
 The research was supported by the Dutch research program Knowledge for Climate. We thank three anonymous reviewers for their helpful comments. We acknowledge the E-OBS data set from the EU-FP6 project ENSEMBLES (http://ensembles-eu.metoffice.com) and the data providers in the ECA&D project (http://eca.knmi.nl). All calculations were performed using the R environment (http://www.r-project.org).