A test for network-wide trends in rainfall extremes


  • Agne Burauskaite-Harju,

    Corresponding author
    1. Division of Statistics, Department of Computer and Information Science, Linköping University, Linköping, Sweden
    • Division of Statistics, Department of Computer and Information Science, Linköping University, SE-58183 Linköping, Sweden.
    Search for more papers by this author
  • Anders Grimvall,

    1. Division of Statistics, Department of Computer and Information Science, Linköping University, Linköping, Sweden
    Search for more papers by this author
  • Claudia von Brömssen

    1. Unit for Statistics and Mathematics, Department of Economics, Swedish University of Agricultural Sciences, Uppsala, Sweden
    Search for more papers by this author


Temporal trends in meteorological extremes are often examined by first reducing daily data to annual index values, such as the 95th or 99th percentiles. Here, we report how this idea can be elaborated to provide an efficient test for trends at a network of stations. The initial step is to make separate estimates of tail probabilities of precipitation amounts for each combination of station and year by fitting a generalised Pareto distribution (GPD) to data above a user-defined threshold. The resulting time series of annual percentile estimates are subsequently fed into a multivariate Mann-Kendall (MK) test for monotonic trends. We performed extensive simulations using artificially generated precipitation data and noted that the power of tests for temporal trends was substantially enhanced when ordinary percentiles were substituted for GPD percentiles. Furthermore, we found that the trend detection was robust to misspecification of the extreme value distribution. An advantage of the MK test is that it can accommodate non-linear trends, and it can also take into account the dependencies between stations in a network. To illustrate our approach, we used long time series of precipitation data from a network of stations in The Netherlands. Copyright © 2010 Royal Meteorological Society

1. Introduction

It is generally believed that global warming causes an increase in the frequency and intensity of extreme weather events. Data representing such changes can be derived from process-based global circulation models (Kharin and Zwiers, 2000; Semenov and Bengtsson, 2002). Moreover, a number of investigators have compiled high-quality datasets and summarised indications of temporal variations in extreme precipitation and daily minimum and maximum temperatures (Groisman et al., 1999; Haylock and Nicholls, 2000; Manton et al., 2001; Klein Tank and Können, 2003; Griffiths et al., 2003; Haylock et al., 2005; Moberg and Jones, 2005; Goswami et al., 2006). The statistical methods that were used in the cited empirical studies are mainly descriptive in nature. In general, this means that the collected time series of daily weather data are first reduced to annual indices, such as 90th or 95th percentiles, and then the obtained sequences of index values are summarised in tables and graphs.

Extreme value theory (EVT) is widely used in climatology to estimate return periods of extreme events (van den Brink et al., 2005; Della-Marta et al., 2009) and in conjunction with risk assessment. It is also applied to homogenise climate data (Della-Marta and Wanner, 2006). However, there are very few studies of climate change that take advantage of EVT (Sugahara et al., 2009). This has previously been pointed out by IPCC (2002). In the cited report, it was also mentioned that regional analysis of extremes calls for further research.

The existing statistical literature on trends in extremes is focused considerably more on EVT. Several researchers have advocated that the problem of detecting and testing for trends in environmental and meteorological extremes is best addressed by using probability models in which the parameters of some extreme value distribution are permitted to vary in a simple fashion with the year of investigation. Smith (1989, 1999, 2001, 2003) has proposed models in which the location and scale parameters of such probability distributions vary linearly or periodically over the study period. Other statisticians have emphasised the need for less rigid models and non-parametric analysis of temporal trends when fitting parametric models to extreme values (Hall and Tajvidi, 2000; Davison and Ramesh, 2000; Pauli and Coles, 2001; Chavez-Demoulin and Davison, 2005; Yee and Stephenson, 2007), but none of those authors has suggested any formal significance test for the presence of trends in extremes at a network of stations.

The aim of the present work was to develop a test for trends in extremes that can accommodate non-linear trends and integrate information from a network of stations. To make our test easy to comprehend, we stuck to the widespread concept of computing annual percentiles, i.e. the first step in our method is to perform separate analyses of subsets of data representing one year of daily weather records from a single station. However, in contrast to the current tradition in climatology, annual percentiles are computed by first fitting a generalised Pareto distribution (GPD) to data above a user-defined threshold and then utilising mathematical relationships between percentiles and the shape and scale parameters of such distributions. The percentiles computed in this manner are subsequently fed into a standard Mann-Kendall test for trends in multiple time series of data (Hirsch and Slack, 1984). By pooling information from a network of comparable stations it may be possible to detect trends in extremes that otherwise would have been overlooked.

It is well known that ordinary annual percentiles computed from daily precipitation records exhibit substantial interannual variation. Accordingly, we undertook extensive simulations to determine whether the variation in annual GPD-based percentiles is less irregular. Furthermore, we investigated how the performance of our trend test is influenced by misspecification errors in the extreme value distribution. We also examined whether separate fitting of GPDs to subsets of data representing a single year leads to substantial loss of information. A collection of long time series of precipitation data from a network of stations in The Netherlands was used to illustrate our approach.

2. Observational data

We selected twelve high-quality time series of daily precipitation records from the European Climate Assessment (ECA) dataset (Klein Tank et al., 2002). All stations were located in The Netherlands and had complete records for the period 1905–2004. The homogeneity of these and other ECA data series has previously been examined by Wijngaard et al. (2003).

Figure 1 illustrates time series of annual medians and 98th percentiles of our dataset. As can be seen, there is a substantial difference in interannual variation between the medians and higher percentiles, and the graphs indicate that the trends may differ as well.

Figure 1.

Annual medians (a) and (c), and 98th percentiles (b) and (d) of daily precipitation on rainy days (precipitation ≥ 1 mm) in The Netherlands. Data from West Terschelling meteorological station (a) and (b), and a network of twelve meteorological stations (c) and (d). Each curve represents one station

3. Methodology

The procedure we propose for detection of trends in extremes in multiple time series of meteorological data comprises the following steps: (1) Partitioning the given data into subsets, normally one subset for each year and site; (2) Fitting GPDs to the exceedances over a user-defined high threshold in each subset of data; (3) Calculating percentiles from the estimated shape and scale parameters of the fitted GPDs; and (4) Analysing temporal trends in the computed sequences of percentiles.

Further details concerning the extreme value distributions, percentiles, and trend tests are given below.

3.1. Extreme value distributions

Given a threshold u, the excess distribution Fu of a random variable X with distribution F is the distribution of Y = Xu conditioned on X > u. Accordingly, we can write

equation image(1)

For a sufficiently high u value, the GPD provides a good approximation of the excess distribution Fu (Balkema and de Haan, 1974; Pickands, 1975). The cumulative distribution function of a GPD is usually written

equation image(2)

where σ> 0 is a scale parameter and ξ≠ 0 a shape parameter.

The case ξ = 0 is a result of elementary calculus

equation image(3)

In other words, the GPD is then identical to an exponential distribution with mean σ. Positive values of ξ produce probability distributions with heavy tails, whereas negative values produce distributions that are constrained to the interval (0, − σ/ξ). Some techniques for threshold u selection have been summarised by Smith (2001).

3.2. Ordinary and GPD-based percentiles

Let y1, y2, …, yN be an ordered sample from a probability distribution with cumulative distribution function (cdf) F, and define an empirical cdf by setting

equation image(4)

and connecting the points (yk, (yk))k = 1, …, N with straight lines. Then we can estimate the 100pth percentile of F by setting

equation image(5)

where −1 denotes the inverse of . In the following, this empiric percentile is referred to as the ordinary percentile of our sample.

Let us now assume that the exceedances of a threshold u can be described by a GPD with parameters ξ and σ. The inverse of such a cdf can then be written

equation image(6)

and GPD-based percentiles can be computed according to

equation image(7)

where Nu denotes the number of observations above the threshold u, and equation image and equation image denote maximum-likelihood estimates of the model parameters. It is known that estimation of GPD and, thus, GPD-based percentiles is sensitive to the choice of threshold, however, additional study showed that trend test results are robust against moderate changes in threshold value.

3.3. Tests for temporal trends

The presence of monotonic trends in sequences of percentiles from a network of stations was assessed by a Mann-Kendall (MK) test.

In the univariate case, the MK test for an upward or downward trend in a data series {Zk, k = 1, 2, …, n} is based on the test statistic

equation image(8)

Achieved significance levels can be derived from a normal approximation of the test statistic. Provided that the null hypothesis is true (i.e. that all permutations of the data are equally likely) and that n is large, T is approximately normal with mean 0 and variance n(n − 1)(2n + 5)/18.

In the multivariate case, the presence of an overall monotonic trend can be assessed by computing the sum of the MK statistics for all coordinates. Provided the null hypothesis is true and n is large, this sum will also be normally distributed with mean zero. However, its variance will be strongly influenced by the interdependence of the coordinates. Therefore, we used a technique proposed by Hirsch and Slack (1984) and Loftis et al. (1991) to estimate the covariance of two MK statistics and compute achieved significance levels (see Appendix).

4. Simulation studies

We conducted a set of simulation studies to examine the performance of ordinary and GPD-based percentiles and to determine how such percentiles can be incorporated into univariate tests for temporal trends in extremes.

4.1. Precision and accuracy of ordinary and GPD-based percentiles

The precision and accuracy of ordinary and GPD-based percentiles were assessed both for samples from known probability distributions and for outputs from a stochastic weather generator.

The first type of data comprised 3000 samples from each of the following GPD(σ, ξ): (1) a heavy-tailed distribution (GPD(40, 0.2)); (2) a finite-tailed distribution (GPD(40, − 0.2)); (3) a mixture of two distributions (GPD(40, 0.2) and GPD(40, − 0.2)). The sample length was normally distributed with a mean and standard deviation (µ = 155, σ = 13) corresponding to the number of rainy days per year at Heathrow Airport in the UK.

Synthetic weather data were created using the EARWIG weather generator (Kilsby et al., 2007), which is based on the Neyman-Scott rectangular pulses rainfall model. More specifically, we generated 3000 samples of precipitation data, each representing one year of daily rainfall records (precipitation ≥ 1 mm) in a 10 × 10 km grid around Heathrow Airport.

For each time series and year, ordinary percentiles were computed according to Equation (5). Moreover, GPD percentiles were computed according to Equation (7) after GPD distributions had been fitted to the 20% largest observations each year. Diagnostic tools for GPD fit (Smith, 2001) were used to ensure a proper choice of threshold. Figures 2 and 3 present the obtained means and standard deviations for 90th–99th percentiles.

Figure 2.

Estimated mean values (a), (c), and (e), and standard deviations (b), (d), and (f) of GPD-based (solid line) and ordinary (dotted line) percentiles. The percentiles were computed from simulated data with a heavy-tailed (a), (b) or a light-tailed (c), (d) distribution, or a mixture of the two distributions (e), (f)

Figure 3.

Estimated mean values (a) and standard deviations (b) of GPD-based (solid line) and ordinary (dotted line) percentiles. The percentiles were computed from rainfall data obtained by running the weather generator EARWIG for a 10 × 10 km area around Heathrow Airport in the UK

As illustrated, the means of the ordinary and GPD-based percentiles were almost identical in all simulations. Inasmuch as the former percentiles were unbiased, we concluded that the GPD-based percentiles were also practically unbiased. The differences in standard deviations were more pronounced, at least for the higher percentiles. Regardless of the probability distribution of the simulated precipitation records, we found that the GPD-based estimators of such percentiles had higher precision than the ordinary percentiles. Closer examination of the probability distribution of the two types of percentiles also indicated that the GPD-based estimators were less skewed. Together, these results showed that GPD-based percentiles performed better than ordinary percentiles for a wide range of underlying distributions of daily precipitation records.

4.2. Performance of univariate trend estimators involving ordinary and GPD-based percentiles

Regressing annual percentiles on time represents a simple method for assessing temporal trends in the magnitude of extreme events. Here we examined the performance of such trend estimators based on ordinary and GPD percentiles, respectively.

The underlying data were generated using a simple rainfall model in which the number of rainy days each year had a normal distribution, and the amount of precipitation on rainy days was exponentially distributed (Zhang et al., 2004). Furthermore, the parameters of the normal and exponential distributions were selected to mimic precipitation data from West Terschelling meteorological station (53.22°N, 5.13°E; 7 m above sea level) in The Netherlands. Accordingly, the number of rainy days per year was assumed to be normal with mean 133 and standard deviation 17, and the average precipitation on rainy days was set to 5.6 mm for the first year of the simulated time period.

Precipitation records representing 100-year-long time periods were generated by concatenating statistically independent one-year datasets. Moreover, trends β2 were introduced by letting the expected amount of precipitation µ and, hence, also all the percentile values y(p) change linearly with years t.

equation image(9)
equation image(10)

We have already noted that GPD-based percentiles performed better than ordinary percentiles for a wide range of underlying probability distributions. In the present simulation experiments, we observed an analogous difference between trend estimators based on the two types of percentiles. The results given in Table I were obtained by regressing annual 98th percentiles on time, and they show that, regardless of the true slope of the trend line, the standard deviation was lower for the GPD-based estimators than for the estimators based on ordinary percentiles. In addition, all the investigated trend estimators were practically unbiased, and the mean square error was dominated by random errors. This provided additional support for GPD-based approaches.

Table I. Bias, root-mean-square error (RMSE), and standard deviation of temporal trends computed by regressing GPD-based and ordinary 98th percentiles on time
Trend slope, %Trend slopeTrend slopes derived from GPD percentilesTrend slopes derived from ordinary percentiles
  BiasRMSEStd. dev.BiasRMSEStd. dev.
50.0110− 0.000840.009890.00986− 0.001140.011920.01187
100.0219− 0.000190.010030.01004− 0.000240.012310.01231
150.0329− 0.000520.010190.01018− 0.000190.012660.01267
200.0438− 0.000920.010500.01046− 0.000490.012770.01276
250.0548− 0.000300.011040.011040.000330.013620.01363
350.0767− 0.001000.010810.01077− 0.000180.013360.01337
400.0876− 0.001430.011340.01126− 0.000750.013750.01374
450.0986− 0.001480.011840.01175− 0.000400.014260.01426
500.1095− 0.000850.012220.012200.000580.015190.01519

4.3. Performance of univariate trend tests relying on models with time-dependent GPD parameters

The time dependence of GPD parameters and percentiles can be modelled in a parametric or non-parametric fashion. The simulation experiments described here aimed to investigate how this modelling can influence the precision and accuracy of univariate trend estimators. In particular, we compared the mean and standard deviation of trend estimators derived from models in which precipitation percentiles were either: (1) linearly increasing; or (2) block-wise constant.

To simplify the simulations, we assumed that each year has a fixed number of rainy days. We set it to 133 which was the mean value at West Terschelling meteorological station. Furthermore, we assumed that the daily rainfall amounts were exponentially distributed, implying that also the exceedances over any given threshold were exponentially distributed with the same mean. The mean amount of precipitation on rainy days was assumed to increase linearly with time.

The entire simulation experiment comprised 1000 data series, each representing 100 years of daily precipitation records, and the block size was varied from 1 to 50 years. For each series and block size, we estimated the GPD parameters for all blocks and calculated percentiles according to Equation (7), and then estimated the trend slope by regressing derived percentiles on time. In addition, we computed maximum-likelihood estimates for a GPD model in which the scale parameter varied linearly with time, and the shape parameter was constant (Smith 1989, 1999, 2001, 2003). Estimated parameters were used to derive percentiles—shown in Equation (7)—and to determine trend slope in percentile series.

When analysing data containing a linear trend, it should be optimal to use trend estimators that are specifically designed to detect such trends. This was partly confirmed by our simulations (Table II). Estimators derived from models in which the scale parameter of a GPD varied linearly with time had higher precision than estimators derived from models in which the distribution of the precipitation amount was assumed to be stepwise constant. However, it should be mentioned that percentile estimators based on Equation (7) are biased, because they do not take into account the possibility of a trend in the probability of exceeding the user-defined threshold. The non-parametric trend estimators offered considerable accuracy and also relatively good precision, regardless of the block size, and thus, they represent a viable option when it is neither feasible nor desirable to set up a parametric model of the trend function of the scale parameter.

Table II. Mean and standard deviation of the slope estimates of linear trends in 98th percentile. Columns from the left are: true trend slope (TTS), ML estimator of linear trend (LT), followed by trend estimators for the stepwise increasing precipitation distributions: BS1, BS3 … BS50 where BS is a block size and the number is the number of years associated with each block size
Std. dev.00.00920.01030.01030.01020.01020.01040.0113

5. Assessment of observational data from a network of stations

We used our proposed two-step procedure to analyse the dataset presented in Figure 1 for temporal trends in extremes. This was done by first deriving GPD percentiles for each year and station, and then feeding those percentiles into a multivariate MK test. Declustering of daily precipitation totals exceeding threshold was performed with an interval of one day to ensure statistical independence of consecutive observations.

Figure 4 illustrates ordinary and GPD-based 98th percentiles for one of the investigated stations (West Terschelling) in The Netherlands. As seen in the simulation experiments aimed at elucidating the precision and accuracy of percentile estimators, we found less variation in the GPD percentiles than in the ordinary percentiles. In particular, it can be noted that the GPD-based series had fewer outliers. Similar patterns were observed at several of the other investigated stations.

Figure 4.

Annual estimates of 98th percentiles of the amount of daily precipitation at West Terschelling station (The Netherlands). The solid line represents GPD-based percentiles, and the stars indicate ordinary percentiles

The difference between the ordinary and GPD-based percentiles was further demonstrated by the results of univariate MK tests for monotonic trends occurring at the examined network of stations. Some trends emerged more clearly in the GPD percentiles because the interannual variation was generally smaller in those values. For example, there were 5 stations with a significant upward trend in the GPD-based 99th percentiles but no significant trends at all in the ordinary percentiles (Figure 5). The advantage of using GPD-based estimators was less apparent in the lower percentiles. Figure 6 illustrates the spatial pattern in the detected trends.

Figure 5.

Achieved significance levels (p-values) of Mann-Kendall tests for monotonic increasing trends in time series of 90th, 95th, 98th, and 99th percentiles. The data came from 12 meteorological stations in The Netherlands

Figure 6.

Achieved significance levels (p-values) of Mann-Kendall tests for monotonic increasing trends in time series of ordinary (a) and GPD-based (b) 98th percentiles. The data originated from 12 meteorological stations in The Netherlands. P-value: ● 0 − 0.05, equation image 0.05 − 0.1, equation image 0.1 − 0.25, equation image > 0.25

The difference between ordinary and GPD-based percentiles emerged even more clearly in the multivariate MK tests for overall monotonic trends at the investigated stations. As shown in Table III, there was a dramatic difference in the achieved significance levels, especially for the higher annual percentiles of daily precipitation records.

Table III. P-values of multivariate Mann-Kendall tests for monotonic trends in time series of annual 90th–99th percentiles of daily precipitation records from 12 meteorological stations in The Netherlands
Percentilep-value based on ordinary percentilesp-value based on GPD percentiles

6. Conclusions and discussion

Two features of our trend test contributed to improve the methods that are currently preferred for analysing trends in meteorological extremes at a network of stations. First, we used a parametric extreme value model to achieve the greatest possible precision in our annual estimates of the probabilities of extreme meteorological events. Second, we fed such annual estimates into a multivariate trend test that was able to accommodate statistically dependent data from a network of stations. It is also worth noticing that the test we developed is easy to comprehend, because it adheres to the climatological tradition of computing annual index values (Groisman et al., 1999; Haylock and Nicholls, 2000; Manton et al., 2001; Klein Tank and Können, 2003; Griffiths et al., 2003; Haylock et al., 2005; Moberg and Jones, 2005; Goswami et al., 2006).

As emphasised in the introduction, the use of parametric extreme value models to estimate tail probabilities is strongly supported in the statistical literature (Gumbel, 1958; Leadbetter et al., 1983; Beirlant et al., 2004). It is an undisputable fact that estimators tailored to specific parametric models can have smaller variance than distribution-free estimators. In addition, this difference in variance is often pronounced when data are scarce, and extreme events are, by definition, infrequent. Our simulation experiments produced the anticipated results. The sample variance of the GPD-based percentile estimates was considerably smaller than that of ordinary purely empirical percentiles (Table I).

A potential drawback of parametric modelling is that the probabilities of extreme events may be systematically over- or underestimated, if the extreme value model is not fully correct. However, our study showed that, from the standpoint of trend detection, bias is not a major problem. Figure 1 clearly shows that the large interannual variation in the intensity of extreme meteorological events is the main obstacle to successful assessment of temporal trends in such events. This conclusion was confirmed by the simulation experiments summarised in Table I. When the mean squared error (MSE) of the GPD-based trend slope estimates was decomposed into variance and squared bias, it was apparent that the MSE value was dominated by random errors, whereas the (squared) bias played a minor role. In addition, we found that our procedure performed also when the analysed data had a mixture distribution that was representing periods of different physical conditions for precipitation formation.

If temporal changes in the parameters of an extreme value distribution can be described by a simple mathematical function, and a single station is taken into consideration, it is feasible to construct parametric tests for trends in extremes (Smith, 1989; Davison and Smith, 1990; Coles and Tawn, 1990; Rootzen and Tajvidi, 1997). In the present study, we used a non-parametric trend test because we wanted to avoid making uncertain assumptions about the shape of the trend curve. Some investigators have previously emphasised that non-parametric techniques are particularly suitable for exploratory analyses of trends in extremes (Davison and Ramesh, 2000; Hall and Tajvidi, 2000; Pauli and Coles, 2001; Chavez-Demoulin and Davison, 2005, Yee and Stephenson, 2007). Furthermore, resampling has been utilised to assess the uncertainty of the estimated trend curves (Davison and Ramesh, 2000). Our method goes one step further by providing a formal trend test. In addition, it is designed to detect any monotonic change over time, which makes it well suited for studies of climate change because the atmospheric concentration of greenhouse gases has long been increasing.

The multivariate MK test that we selected for the final trend assessment is especially convenient for evaluating trends at a network of stations, because it can accommodate statistically dependent time series of data. In particular, it can be noted that our procedure does not require any explicit modelling of the multivariate distribution of the meteorological observations at the investigated sites. The only thing that matters in such an MK test is the pattern of plus and minus signs that is observed when estimates of tail probabilities are compared for all possible pairs of years at each of the stations. Furthermore, our approach makes it possible to determine the statistical significance of an overall trend in meteorological extremes by employing a fully automated procedure developed by Hirsch and Slack (1984).

A potential weakness of our procedure, as well as all other methods presently used to assess trends in climate extremes, is related to the presence of long-term cyclic patterns or autocorrelations in the analysed data. The multivariate MK test assumes that the vectors used as inputs are temporally independent. When computing tail probabilities separately for each year, it is thus assumed that there is no correlation between years. The robustness of our test to serial correlation can be enhanced by computing tail probabilities over periods of two or more years, or by reorganising annual records into a new matrix with larger time steps (Wahlin and Grimvall, 2009). However, there is a trade-off between the length of the mentioned period and the power of the trend tests.

Another potential limitation of our technique is associated with the fact that it is not derived from any optimality criterion. In particular, there may be loss of power because there is no parametric modelling of the temporal dependence. However, our simulations of such effects showed that the loss of power is moderate (Table II), and therefore, we conclude that this weakness does not outweigh the advantages of our procedure.

Finally, it can be noted that our case study provided further evidence of the strength of our procedure. The results presented in Table III demonstrate that our test was able to detect trends in precipitation extremes that would have been overlooked if we had based our inference on ordinary percentiles that did not rely on any extreme value model.


The authors are grateful for financial support from the Swedish Environmental Protection Agency.

A.1. Appendix

Multivariate MK test. Let {Zgi, i = 1, 2, …, n;g = 1, 2, …, m} be m time series of data. We form a test statistics

equation image

Then the test statistics T is approximately normal with zero mean and estimated variance:

equation image


equation image